ip phones failover to srst router even when CUCM is still reachable

Unanswered Question
Aug 10th, 2009

Guys, i have the following topology:

CUCM---[switch1]-------[switch2]-----IP phones



SRST router

I have on both switches data vlans and voice vlans.

Some of the phones at a random time failover to the srst router while the others are still registered to the CUCM.

There is a trunk port between both switches.

I checked the traces on the CUCM and i found a socket broken message when the ip phone is unregistered from the CUCM.

I plugged an ip phone directly on switch1 to check whether the same problem persists, i found that the tested phone works great.

Also i checked if there are some input/output errors on the interfaces on both switches, bt everything is great.

Did anyone face a similar problem before?

Please advice



  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Wilson Samuel Mon, 08/10/2009 - 10:37

Hi Moustapha,

It seems more like a connectivity issue than anything else at your switch level.

Could you please verify if the connectivity between the switches themselves and the switch and phones are perfect.

You may want to do a sho interface and check the err after you clear the stats for atleast 24 hours and see if the err are increasing?

Please rate if it helps


wilson Samuel

Mustafa Al Housami Mon, 08/10/2009 - 21:06


i already checked that, and no errors are displayed on the interfaces.

The CUCM and SRST router's config are correct, and i am still investigating to check this issue.

I anyone has any other suggestions, please help


vipersl65 Tue, 08/11/2009 - 00:04

Isolate the problem further.

1)Phone loads firm version. Are they the same for all the phones having the same phone model?

2)Switch OS version, are they the same?

3)Locate a phone that failover and note the switch port it is connected. Swap it with a phone that NEVER failed over and not the switch and the port it was connected to, observe.

koziollz1 Thu, 05/19/2011 - 08:07

I am also experiencing this issue? Have you been able to resolve it OP? Do you recall what it was?

Can anyone else provdie any input? Has anyone else ran into this and sucesfully resolve it?

I pulled CUCM traces and see the following for multiple phones during the reported SRST timeframe:

05/16/2011 10:05:11.183 CCM|StationInit: TCPPid = []Socket Broken. DeviceName=,IPAddr=, Port=0x33ff, Device Controller=[0,0,0]|

05/16/2011 10:07:06.117 CCM|StationInit: TCPPid=[] Keep alive timeout.|

My topology is simplified comapred to OP:

                       SRST router



CUCM1&2---[core switch]-------rest of network, switches, IP phones



                       IP Phones

Unfortunetly I did not catch the show log fromt he router before it buffered out.

Any help would be greatly apprecaited! 
clileikis Thu, 05/19/2011 - 08:35

Sounds like a network problem somewhere between the IP phones and call manager. How often is this happening? Any network issues?

koziollz1 Thu, 05/19/2011 - 08:50

This is the first time I know of it has ever happened.

The system has been running stable for years.The router was rebooted back in December, has been running stable as well.

None of the remote sites experienced any issues at this time, so we know the connectivity between GW<----->Switch<----->CMs is well and stable.

Pinging the CM from router is good:


Type escape sequence to abort.

Sending 5, 100-byte ICMP Echos to, timeout is 2 seconds:


Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms


Type escape sequence to abort.

Sending 5, 100-byte ICMP Echos to, timeout is 2 seconds:


Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms


Type escape sequence to abort.

Tracing the route to

  1 0 msec 0 msec 4 msec


Type escape sequence to abort.

Tracing the route to

  1 0 msec 0 msec 0 msec

Pinging the router from CM is good:

admin:utils network ping

PING ( 56(84) bytes of data.

64 bytes from icmp_seq=0 ttl=255 time=1.05 ms

64 bytes from icmp_seq=1 ttl=255 time=0.625 ms

64 bytes from icmp_seq=2 ttl=255 time=0.681 ms

64 bytes from icmp_seq=3 ttl=255 time=0.641 ms

--- ping statistics ---

4 packets transmitted, 4 received, 0% packet loss, time 3098ms

rtt min/avg/max/mdev = 0.625/0.750/1.055/0.179 ms, pipe 2

admin:utils network traceroute

1 (  0.646 ms *  0.540 ms

admin:utils network ping

PING ( 56(84) bytes of data.

64 bytes from icmp_seq=0 ttl=255 time=0.527 ms

64 bytes from icmp_seq=1 ttl=255 time=0.575 ms

64 bytes from icmp_seq=2 ttl=255 time=0.517 ms

64 bytes from icmp_seq=3 ttl=255 time=0.632 ms

--- ping statistics ---

4 packets transmitted, 4 received, 0% packet loss, time 3013ms

rtt min/avg/max/mdev = 0.517/0.562/0.632/0.054 ms, pipe 2

admin:utils network traceroute

1 (  0.695 ms *  0.654 ms

koziollz1 Thu, 05/19/2011 - 09:00

The SDL traces show the following for the same IP address (just using one of the phones):


2011/05/16 10:05:11.183



AlarmClass: CallManager

AlarmName: DeviceTransientConnection

AlarmSeverity: Error AlarmMessage:

AlarmDescription: Transient connection attempt.

AlarmParameters:  ConnectingPort:13311






AppID:Cisco CallManager



jaylena123 Thu, 07/14/2011 - 16:49

we are having this same "socket broken" issue.  did you ever find a resolution?

Tapan Dutt Thu, 07/14/2011 - 17:57


We need to isolate the issue first...

Is it the phone or the Switch or the Call manager  the Culprit?

Concept is simple: as soon as the phone looses coonectivity and does not receive a keepalive within 90 seconds it failovers either to the secondary call manager server or the SRST mode.



Tapan Dutt Thu, 07/14/2011 - 17:59

adding to above check the Debug Display on the Web Page of the phone(accessible when you click on the IP address of the phone in CCM

pmvillarama Fri, 07/15/2011 - 00:31


I think the cause was already isolated when you transferred the phones to a different switch. Did you find anything from switch 1?

koziollz1 Fri, 07/15/2011 - 07:10

Let my try to respond to the new posts:

The issue only occurred once, client requested reason-for-outage hence what began the investigation. We went through collecting packet captures, traces, logs, etc etc etc the whole nine yards. We did not find any issues internally on the network. The traces/logs pulled from the issue occurred all point to no connectivity to CallManager, but no reason why. We are not dropping packets and the response times are better than ideal. The issue has never occurred since.

What I find is extremely strange is:

- Only a few random phones went into SRST

- The CUCM and router are both connected via the core switch, hence if they dropped connectivity to cm but not the router seems odd as they can clearly communicate through said switch

- Other phones were working just fine at the time

- The phones were scattered on different ports and backplanes

- I highly doubt its hardware failing on specific ports, very unlikely to happen all the same time and work again after

- We had no network monitoring alerts regarding critical devices loosing connectivity (servers, router, switches)

- It was just a few random phones, internal network, that registered via SRST for a few minutes.....makes no sense....and unable to locate an answer.......Cisco TAC was unable to assist with a reason-for-outage as well, only suggestion was wait for it to happen again......which if it does I doubt we'd be able to collect any different or better data

- And just to play devil's advocate: if it smells like a bug, walks like a bug, talks like a bug......

So issue has never occured again, but we are still in the dark as to why it did happen.....sorry I dont have a better answer for all


This Discussion