IP Phones randomly Rebooting

Endorsed Question
Mar 3rd, 2010

We are having a few issues here with Call Manager 5.1 whereas the phones have started to reboot randomly, they have been fine for over 18months and to the best of my knowleagde nothing has changed on the network, below are the debug logs taken from the phone: -

10:53:13a 10: Name=SEP00260B5D595D Load= SCCP11.8-3-3SR2S Last=TCP-timeout

10:53:44a 10: Name=SEP00260B5D595D Load= SCCP11.8-3-3SR2S Last=TCP-timeout

10:53:44a 18: Name=SEP00260B5D595D Load= SCCP11.8-3-3SR2S Last=Failback

9:44:51p 25: Name=SEP00260B5D595D Load= SCCP11.8-3-3SR2S Last=Initialized

11:34:51a 10: Name=SEP00260B5D595D Load= SCCP11.8-3-3SR2S Last=TCP-timeout

11:49:42a 10: Name=SEP00260B5D595D Load= SCCP11.8-3-3SR2S Last=TCP-timeout

What would cause these timeouts?

Thanks

Martyn

I have this problem too.
0 votes
Endorsed by Dan Bruhn
Aaron Harrison about 4 years 1 month ago

Hi

The timeouts are most likely to be caused by network conditions.

If you are sure this is not the case then the server may be having issues... if  a reboot cures it there's likely to be some sort of bug or memory leak that is causing it to not respond. There is also a limit on the number of phones/devices that can be in the server subnet - so if you have 1000+ phones in the same subnet as the serves, the ARP tables fill up and connections will be dropped.

Regards

Aaron

Please rate helpful posts...

  • 1
  • 2
  • 3
  • 4
  • 5
Average Rating: 3.7 (6 ratings)
Aaron Harrison Wed, 03/03/2010 - 06:05

Hi

The timeouts are most likely to be caused by network conditions.

If you are sure this is not the case then the server may be having issues... if  a reboot cures it there's likely to be some sort of bug or memory leak that is causing it to not respond. There is also a limit on the number of phones/devices that can be in the server subnet - so if you have 1000+ phones in the same subnet as the serves, the ARP tables fill up and connections will be dropped.

Regards

Aaron

Please rate helpful posts...

martynch1 Wed, 03/03/2010 - 06:50

Thanks for the reply, is there anyway I can find any logs either from the switch or RTMT to help solve this problem, I

noticed that when I ping out CalManager from my Desktop I get 100% sucess, how ever if I ping from our switch on site where the phones are I get 5 responses then drops a packet, it follows a pattern so I wondering if this should be the case.

Thanks

Martyn

PS only 403 registered phones and average of 20 calls

Paolo Bevilacqua Wed, 03/03/2010 - 10:34

how ever if I ping from our switch on site where the phones are I get 5 responses then drops a packet,

That is normal, by design phones rate limit ping responses,

martynch1 Thu, 03/04/2010 - 01:12

Phones are still bouncing today, I want to run wireshark to see if thatr picks anything up but wondering how to set it up.

At the minute Im just plugged into the PC port of the IP phone and capturing everything, is that the correct way to go.

Thanks

Martyn

Aaron Harrison Thu, 03/04/2010 - 01:32

Hi

Sounds like a good start - as the traffic is TCP, if there is trouble with the communication you are likely to see retransmissions from the client when it tries to send keepalives..

Regards

martynch1 Thu, 03/04/2010 - 01:36

Hi and thanks for the reply, couple of questions

Would you put any filters on wireshark

Will wireshark captures all the phone vlan traffic if my PC is in the Data vlan

Thanks again

Martyn

Aaron Harrison Thu, 03/04/2010 - 01:53

Hi

Well - if you are just interested in Phone-CCM traffic, you can filter the capture so that you don't pick up all the PC stuff. Probably sensible if you will leave the cap running all day.

Something like this would filter based on source/dest IP:

ip.addr == 192.168.0.0/24 || ip.addr == 192.168.1.0/24

Substitute your phone and CCM server subnets to just get traffic from those source/dests.

When you click the Start icon, instead of just clicking Start again to get a default capture, hit Options. Set up the Capture files bit to write the data out to multiple file every 30 minutes or so instead of just piling it all into memory until your PC falls over.

Finally - you would want the 'PC Voice VLAN Access' and 'SPAN to PC Port' options checked on the handset device page in CCM, then Reset. WHen you start the capture, verify you can see the packets.... You should see SCCP (filter keyword is just 'skinny') whenever you do things on the handset, and when you make a call should see lots of RTP (filter 'rtp').

Regards

Aaron

Please rate helpful posts...         

martynch1 Thu, 03/04/2010 - 04:31

Can you beleive it, no reboots since I have been monitoring...

Aaron Harrison Thu, 03/04/2010 - 09:07

Of course not... just leave that packet cap running forever and all will be well :-)

martynch1 Fri, 03/05/2010 - 02:56

Cought one yesterday just before I left for home.

545 49.457337 172.28.113.21 172.29.3.10 TCP 50015 > cisco-sccp [RST, ACK] Seq=37 Ack=25 Win=8192 Len=0

Also there were a couple of:

543 48.843708 172.28.113.21 172.29.3.10 SKINNY [TCP Retransmission] KeepAliveMessage

Thanks

Martyn

Aaron Harrison Fri, 03/05/2010 - 04:09

Hi there

So these traces suggest that the keepalives aren't being ACKed, so are being retransmitted... most likely due to network conditions.

Do you have a QoS enabled WAN?

If you're still not sure if it's WAN conditions, you can trace from both ends and see whether the keepalives sent by the phone actually turn up on the server..

On a linux CCM you can do:

utils network capture file host ip size all count 100000

Then the CLI will block until you hit the 'count' number of packets or press CTRL+C. You can retrieve the file using the RTMT trace/log collection page...

Regards

Aaron         

thucnk_FPT_FIS_6_2 Tue, 07/10/2012 - 01:57

Hi Aaron Harrison,

I have same issue with local LAN. But, my network is subnet 20 ( 255.255.240.0). Do you have any ideas for my case?

Thuc

sradogna2 Fri, 04/22/2011 - 11:05

Martyn,

Did you ever get this issue resolved?

I am having the exact issue

Thanks,

Steve

dsobrinho Tue, 08/30/2011 - 21:03

Hi guys,

Does anybody knows how to resolved this issue?

Best Regards

Daniel

richb1971 Wed, 09/07/2011 - 07:31

Just when you think you've found the exact problem..... no resolution? I've got same problem here on CCM8.0.3 with 7942 firmware 9.0.3 phones. Heard mention of QoS issues and ASA version/SCCPv19 issues.

Martyn, Daniel,Steve - did you guys fix this?

8:25:58p 10: Name=SEP001FCAE7AFC8 Load= SCCP42.9-0-3S Last=TCP-timeout
8:28:07p 14: Name=SEP001FCAE7AFC8 Load= SCCP42.9-0-3S Last=UCM-closed-TCP
8:32:08p 18: Name=SEP001FCAE7AFC8 Load= SCCP42.9-0-3S Last=Failback

Regards

Rich

martynch1 Wed, 09/07/2011 - 07:46

Do you have port security enabled on your switch, if so what is is set to

Martyn

richb1971 Thu, 09/08/2011 - 01:23

I'll check Martyn. Awaiting customer to confirm firewall settings too-heard of ASA causing issues with newer firmwares.

Rich

sradogna2 Wed, 09/07/2011 - 09:51

Rich,

Upon further investigation, we determined that only the 7937 conference phones were resetting

Changing the access VLAN to be the same as the voice VLAN fixed the issue (recommended by TAC after weeks of troubleshooting)

santosraymond Fri, 07/27/2012 - 07:29

Hi Steve,

I'm trying to understand the recommendation by TAC regarding the fix to your issue. Did they explain why ?

Thanks

mickdoran1968 Tue, 11/06/2012 - 12:14

We have the issue of phones resetting randomly in a CMBE6000 system with only 48 phones connected.  There are 4 locations, 3 connected via EPL and 1 location via point to point T1.

All locations, including the main site are seeing sporadic phone resetting.  All the phones are either 7942 or 7962 with SCCP42.9-1-1SR1S app load.

CCM version is 8.5

Over the past month I have initiated two TAC calls.  The following has been done with no resolution to the problem:

  • Increase the 'Station Keepalive Interval' to 300 ms from the default 30 ms
  • Erased all of the QoS settings on every swtich and router and utilized Cisco AutoQoS
  • Checked cabling to the CCM/Phones, etc.
  • Set all the phones to Delayed according to the following:

Geometric TCP

The Cisco Unified IP Phone firmware 7.2(1) introduced a Geometric TCP mechanism to permit IP Phones to measure the round-trip delay between the IP Phone and Unified CM, then adapt the keepalive timeout value. This provided a very accurate failover mechanism when the network delay is consistent.

However, if the network delay is inconsistent, this mechanism may cause the IP Phones to inaccurately attempt failover. The Cisco Unified IP Phone firmware 8.4(2) introduces the ability for the Network Administrator to disable this behavior, if necessary, through the Detect Unified CM Connection Failure parameter defined on the IP Phone device configuration. The default value is Normal; this Geometric TCP mechanism can be disabled if the parameter is set to Delayed.

Has anyone resolved this issue?

-Mick

mickdoran1968 Wed, 11/07/2012 - 06:08

I'm assuming you want to see the QoS settings:

!

mls qos map cos-dscp 0 8 16 24 32 46 48 56

mls qos srr-queue input bandwidth 70 30

mls qos srr-queue input threshold 1 80 90

mls qos srr-queue input priority-queue 2 bandwidth 30

mls qos srr-queue input cos-map queue 1 threshold 2 3

mls qos srr-queue input cos-map queue 1 threshold 3 6 7

mls qos srr-queue input cos-map queue 2 threshold 1 4

mls qos srr-queue input dscp-map queue 1 threshold 2 24

mls qos srr-queue input dscp-map queue 1 threshold 3 48 49 50 51 52 53 54 55

mls qos srr-queue input dscp-map queue 1 threshold 3 56 57 58 59 60 61 62 63

mls qos srr-queue input dscp-map queue 2 threshold 3 32 33 40 41 42 43 44 45

mls qos srr-queue input dscp-map queue 2 threshold 3 46 47

mls qos srr-queue output cos-map queue 1 threshold 3 4 5

mls qos srr-queue output cos-map queue 2 threshold 1 2

mls qos srr-queue output cos-map queue 2 threshold 2 3

mls qos srr-queue output cos-map queue 2 threshold 3 6 7

mls qos srr-queue output cos-map queue 3 threshold 3 0

mls qos srr-queue output cos-map queue 4 threshold 3 1

mls qos srr-queue output dscp-map queue 1 threshold 3 32 33 40 41 42 43 44 45

mls qos srr-queue output dscp-map queue 1 threshold 3 46 47

mls qos srr-queue output dscp-map queue 2 threshold 1 16 17 18 19 20 21 22 23

mls qos srr-queue output dscp-map queue 2 threshold 1 26 27 28 29 30 31 34 35

mls qos srr-queue output dscp-map queue 2 threshold 1 36 37 38 39

mls qos srr-queue output dscp-map queue 2 threshold 2 24

mls qos srr-queue output dscp-map queue 2 threshold 3 48 49 50 51 52 53 54 55

mls qos srr-queue output dscp-map queue 2 threshold 3 56 57 58 59 60 61 62 63

mls qos srr-queue output dscp-map queue 3 threshold 3 0 1 2 3 4 5 6 7

mls qos srr-queue output dscp-map queue 4 threshold 1 8 9 11 13 15

mls qos srr-queue output dscp-map queue 4 threshold 2 10 12 14

mls qos queue-set output 1 threshold 1 100 100 50 200

mls qos queue-set output 1 threshold 2 125 125 100 400

mls qos queue-set output 1 threshold 3 100 100 100 400

mls qos queue-set output 1 threshold 4 60 150 50 200

mls qos queue-set output 1 buffers 15 25 40 20

mls qos

!

interface GigabitEthernet0/1

description TX-C2911-FNB-GR-M1-1

switchport access vlan 49

switchport mode access

srr-queue bandwidth share 1 30 35 5

queue-set 2

priority-queue out

mls qos trust cos

auto qos trust

no mdix auto

!

interface GigabitEthernet0/2

description **********

switchport access vlan 40

switchport mode access

srr-queue bandwidth share 1 30 35 5

queue-set 2

priority-queue out

mls qos cos 1

mls qos trust cos

auto qos trust

no mdix auto

spanning-tree portfast

!

interface GigabitEthernet0/3

switchport access vlan 40

switchport mode access

switchport voice vlan 49

switchport priority extend cos 1

srr-queue bandwidth share 1 30 35 5

queue-set 2

priority-queue out

mls qos cos 1

mls qos trust cos

auto qos trust

no mdix auto

spanning-tree portfast

interface Vlan1

no ip address

!

interface Vlan40

description Workstation VLAN

ip address 10.X.X.X 255.255.0.0

ip helper-address 10.X.X.X

no ip proxy-arp

!

interface Vlan49

description VoIP VLAN

ip address 10.X.X.X 255.255.0.0

no ip proxy-arp

!

!

router eigrp 10

network 10.0.0.0

eigrp stub connected summary

!

ip default-gateway 10.X.X.X

ip classless

ip http server

ip http secure-server

!

interface Vlan1
no ip address
!
interface Vlan40
description Workstation VLAN
ip address 10.X.X.X 255.255.0.0
ip helper-address 10.10.240.11
no ip proxy-arp
!
interface Vlan49
description VoIP VLAN
ip address 10.X.X.X 255.255.0.0
no ip proxy-arp
!
!
router eigrp 10
network 10.0.0.0
eigrp stub connected summary
!
ip default-gateway 10.X.X.X
ip classless
ip http server
ip http secure-server

jsmith77690 Thu, 11/07/2013 - 16:47

Michael,

  Did you ever find a resolution to your phone resetting? If so, do you mind sharing. We are having the same issue.

Thanks,

Jason

mickdoran1968 Fri, 11/08/2013 - 05:14

Hi Jason,

Yes we did.  Our system was a CMBE5K running Call Manager 8.5.  After several months of back and forth with Cisco TAC increasing Keep Alive Timers and the reviewing QoS on the network infrastructure and finally trying to send them multiple packet captures and traces from RTMT out of frustration from not finding a resolution and at odd's with the TAC team I simply upgraded the system to 8.6 and the problem imeediately went away.

So the actual resolution is not known in my case.  Could have been a corrupt operating system?  Bug in the 8.5 or the phone firmware shipped with that version?

I can't say for sure but it seams logical that if the problem was never there before and no major changes to the network, etc. have been made then maybe rebuildig/reinstalling the system and restoring a backup or upgrading the system may be the resolution.  8.5 to 8.6 is a refresh upgrade which from my understanding completely reinstalls the operating system which would have eliminated any corruption in the file system.

-Mick

Actions

Login or Register to take actions

This Discussion

Posted March 3, 2010 at 3:52 AM
Stats:
Replies:24 Avg. Rating:3.66667
Views:10967 Votes:0
Shares:8
Tags: No tags.

Discussions Leaderboard