Can someone help me here?
Our IP phones are getting resetting and restarting frequently. Details are given below, but its not affecting our active calls.
9:38:38a 14: Name=SEPECC882B0AD77 Load= SCCP45.9-0-3S Last=UCM-closed-TCP
9:38:38a 18: Name=SEPECC882B0AD77 Load= SCCP45.9-0-3S Last=Failback
9:40:10a 10: Name=SEPECC882B0AD77 Load= SCCP45.9-0-3S Last=TCP-timeout
9:41:11a 14: Name=SEPECC882B0AD77 Load= SCCP45.9-0-3S Last=UCM-closed-TCP
9:41:11a 18: Name=SEPECC882B0AD77 Load= SCCP45.9-0-3S Last=Failback
10:09:49a 10: Name=SEPECC882B0AD77 Load= SCCP45.9-0-3S Last=TCP-timeout
10:09:51a 23: Name=SEPECC882B0AD77 Load= SCCP45.9-0-3S Last=Reset-Restart
10:28:00a 10: Name=SEPECC882B0AD77 Load= SCCP45.9-0-3S Last=TCP-timeout
10:28:10a 23: Name=SEPECC882B0AD77 Load= SCCP45.9-0-3S Last=Reset-Restart
App Load ID jar45sccp.9-0-3TH1-22.sbn
Boot Load ID tnp65.8-3-1-21a.bin
CUCM Version 7.1.5
Thanks in advance
Looks like they are loosing connections to CUCM. Check on your heartbeats to/from the IP phone below to its primary CUCM server and backup. It seems to be bouncing between primary and backup for some reason.
- WAN connection issue
- busy network
- connection issue at CUCM
- mismatched port speed at the phone/switch
Hi Tommer Catline,
Thanks for your information !!!!
CUCM server is in local LAN. I have checked the LAN performance and have not seen any packet drop between Phone and CUCM. Futher I have changed the keeplive timer also, but still I am facing the same problem.
From your desktop computer, if you ping the CUCM Publisher and Subscriber IP addresses, do you see any dropped packets or delays going on?
Did you guys ever find a solution to this? I am having this problem on a Gig LAN network from the office across the street. That's connected over fiber. I am pinging the devices constantly and there are no drops, but the user's 7942 phones received: "CM Fallback Service Operating" There are no network drops and all other applications work without any issues. Other phones on the LAN also seem to work fine.
We had similar issue with 7962 and 7965 phone, working perfectly in my office, but when moving to end user, the phone was continuously restarting from 5 to 30 seconds after registering. That office where end user was situated uses HP switch (on the other hand 78XX phones work there with no problem). I went through all possible options in configuration and debugging, when finally finding in logs some issues on phone with vlans (old vlan 4096, new vlan 4095).
The issue was resolved setting vlan tagging on the Cisco phone and HP switch, where default router was behind HP switch, namely Cisco.
Thanks for your solution Ales. I actually resolved our problem by factory resetting the phones and performing a reboot on the CUCM servers which had been up for nearly 2 years after gaining access to the OS. Has not re occurred since. Should not all our equipment is on Gig Cisco 3850 switches with Fiber connectivity, though the network does not appear to have been a problem.
Hopes this helps others.
this sounds very much like a TCP timeout issue normaly caused by some sort of stateful filtering done between the CUCM and the end user handset.
We had the same issue caused by Checkpoint firewalls. There is a known bug in SecureXL where it will impact tcp packets if you utilie a PPPOE link.
This is still an active bug on Giai 77.30.
Point being is that make sure you do not have any filtering that would impact or force different timeout values on your tcp packets
Please any one can help us out from this issue
Does anybody knows something like that? I have the same problem with Cisco IP Phone 7962, in branch office in RJ.
I have been checked the WAN, QoS and LAN and no problems was found.
If you are using third generation phone(7970, 79x1, 79x2, 79x5), then there is no fix for it.
This is what I got latest update.
There are a couple of things that need to be kept in mind:
1. Phones will unregister from the CUCM and register to the SRST GW
2. GW will tear down Q.931 backhauling from CUCM and will function as a standalone call agent.
The phones will lose either SCCP or TCP keepalives. TCP keepalives being missed generally trigger SRST much faster than SCCP keepalives being missed.
The phones will register to the GW even if Q.931 backhauling has not been torn down. Hence, there might be a brief period where the phones have registered to the GW but the PRI is still MGCP controlled. In such a case, calls will initially fail. After some time (and this is post the 30 second MGCP KA) the Q.931 backhauling will be torn down (isdn bind-l3 ccm-manager will be removed) and the PRI will function as if it has been configured on an H.323 GW. Here we need to understand that the phones showing "Registering" or "CM Fallback Service Operating" are not indicative of the GW going into SRST. The phones will go into SRST much faster than the Q.931 backhauling being torn down.
The behavior you are seeing is the mechanism for timing out a TCP connection and has nothing to do with the SCCP keepalive itself. Any time the phone sends a TCP packet to the server and does not receive a TCP Ack. The phone will retransmit the packet at decreasing intervals until the session is timed out (phone sends TCP RST) and at that point the phone will failover to the next CCM server or SRST reference.
The SCCP keepalives are sent at regular intervals, based on a value presented to the phone during registration (30 seconds by default). If the phone gets a TCP ack for the keepalive, but no SCCP keepaliveAck from the server then you can get into the situation where the phone unregisters due to keepalive timeout (after 2 or 3 such missed keepaliveAcks).
The former is a network problem, the latter is an application problem where, the network layer of the CCM server is acknowledging that the message was received but the CCM application is not responding.
You will note in your example when the phone registers with the SRST router the sccp Alarmmessage it sends will contain a string like "last=TCP Timeout" or similar.
The 3rd gen phones (7970, 79x1, 79x2, 79x5) are much more aggressive in timing out the TCP session than the 2nd gen phones. What took your 7960 26 seconds to unregister will take a 7965 about 8 seconds.
I had a juniper firewall once between a remote site and CUCM and the SCCP keep alives were being delayed. This caused issues with some phones of course. Not sure if this is something you have or something similar. You may also adjust the trigger in CUCM SCCP keepalives to a higher value. This may help as well.
Thanks for the information - it looks like we are suffering from the 3rd generation phone TCP mechanisms.
Still doesn't get away from the fact that our carrier is causing us to have retransmissions but at least we can identify some workarounds now.
Many thanks again.
Cisco Unified IP Phone 7900 series introduced a Geometric TCP mechanism to permit IP Phones to measure the round-trip delay between the IP Phone and Unified CM, then adapt the keepalive timeout value. This provided a very accurate failover mechanism when the network delay is consistent. However, if the network delay is inconsistent, this mechanism may cause the IP Phones to inaccurately attempt failover. Cisco Unified IP Phone firmware 8.4(2) introduces the ability for the Network Administrator to disable this behavior, if necessary, through the Detect Unified CM Connection Failure parameter defined on the IP Phone device configuration. The default value is Normal; this Geometric TCP mechanism can be disabled if the parameter is set to Delayed. That is documented in
Cisco Unified IP Phone Release Notes for Firmware Release 8.4(2) 7971G-GE, 7970G, 7961G-GE, 7961G, 7941G-GE, 7941G, 7931G, 7911G, and 7906G (SCCP and SIP)
Hope this helps
I am also facing same problem for Cisco 7941G and 7962 phones.
Problem remians unresolved, even with latest firmware.
Phone reset and re-register automatically with error, last=reset-restart last=TCP timeout
Is there any solution for this problem???? This probelm happens for the phone across WAN as well as LAN.
We continously piniging CUCM for couple of days, there not even single timeout...!
But Phone sometime un-register and register back its own....!
As you ca suspect, we have a similar issue, and can't find the cause for it.
rajesh.kumar, did you resolve your issue?
Hope to get some feedback,
We are experinceing this issue only with remote users using VPn. Phones will restart atleast once in day. Checked the logs TCP time out and reboots take 5 to 10 minutes. Put the users into solarwinds no issues detected with connectivity.
The only possible method for finding the root cause in this situation is :
!) Set up parallel packet captures for the primary callmanager ( to which the phone is registered ) and the IP phone itself and let it run until the issue is reported. You will need a dedicated pc with wireshark running to do so.
2) Analyze the captures from about 5 minutes before the issue is reported, looking for any packets that left the phone and did not make it to the callmanager or a delay in the same and the same thing in reverse direction.
Probably not an issue with v7.1.5, but I just had a TAC case with identical issue on our v22.214.171.12400-2 >> 8.6(2a) where RTMT showed an alert for number of reg'd phones exceeded, even though we have a 7,500 endpoint OVA deployment on each node and the alerted node had less than 1,000 end points registered in a cluster with 4 CUCM subscriber nodes. If you're running v8.6.2 this may be your issue as well on phones continously resetting themselves.
Conditions state ""client is connecting over an unstable network which causes the client to continuously attempt registrations to the backup subscriber," but that's not the case in our environment. It is stable but TAC still deemed this as our problem.
Subscriber incorrectly indicates NumRegisteredDevices is exceeded:
Unified CM Subscriber does not allow phones to register and generates a NumRegisteredDevices exceeded alarm even though there are only a few or no devices registered.
None. Once this condition occurs, the Cisco CallManager service must be restarted on the affected subscriber.
Affected: 126.96.36.19900-2 >> 8.6(2a)