This is the second time in about 3-4 weeks that we have experienced this problem. Both times it has been a Monday.
Anyway, I start getting a lot of reports of poor voice quality and then when I look at the logs on our publisher it shows a ton of application events related to phones unregistering. It's not a just a small group but almost everyphone in the cluster (about 1,400).
Initially, the phone should be registered with the subscriber. I checked the event logs and nothing out of the ordinary shows up. There were not any reboots or anything. It's 7905's,40's and 60's I believe.
Has anyone else had something like this happen? I've found posts of a few phones doing this but not this many.
We are on version 4.1(3)sr1. All phones run over 3550's and we have gig connections to all 23 buildings in the area.
If there is poor voice quality and after that IP phones unregister could be that one of your Core devices are failing, any alarm of report?.
What is the reason code for IP phones unregistering in the EV?
Initially Phones registered to subscriber, but errors in the Publisher?
All in the same location or via a WAN link?
Let us know.
Right about the time that I started receiving numerous complaints regarding voice quality and phones operating sporadically is when I noticed that a lot of devices started to unregister. This was evident when the problem happened on the 14th and when it happened on the 28th of this month.
The device_unregistering error is showing up on Publisher with a reason code of 8. Some, not many, have a reason code of 9. Reason code 8 suggests that the device initiated the reset. I'm not sure if when it rehomes that it initiates the reset or not.
On my subscriber I get transient connection attempts with a reason code of 6 and the devices are all ATA's. This error has been there since the upgrade to 4.1(3).
Both servers are in the same building an in the same rack. We have 1gig wan links to all of our outbuildings with the farthest run being about 6-8 miles away from HQ. We have fiber running between all buildings and we use Cisco's 3550 to power our phones. All phones are also in a separate VLAN from our data traffice and all of our 3550's are trunked to allow for either a computer or phone to be plugged in.
I really appreciate the help. We are looking to see if anything that we had running may have interfered with normal operations on Monday. Also we are running MRTG to monitor our switches including our 3550's. Could that be an issue?
Not sure whether this is relevent but we had a similiar issue.We were using a 6509 to do the inter vlan routing and for some reason the vlans would keep coming up and down and the routing would fail.
Interestingly it only happened on the vlans used for voice.We upgraded the MSFC cards and the problem went away.
I don't know if this will help but we also had this same problem. We found the phone load/DHCP server causing the problem. We noticed that the reboots/unregestering would happen half the life of a DHCP lease. We also upgraded our phone load and the problem went away. Try to upgrade the phone load on one of the problem phones and when this problem happens again, see if that phone unregisters.
We're up on the most current phone load and since this just starting happening, it shouldn't have anything to do with the leases. I'll still check it to see if this may be causing it.
I would imagine since we've been up just over 2 years that it would have happened earlier. Thanks for the idea to at least check.
I have the exact problem, I have a setup very similar to yours, 2500 phones, 35 buildings, 4 call managers. two of the CM's have the Event log full of devices giving this error, (Transient connection attempt. error 3) I exported the logs and started to look at them closer.
Most of mine devices are ATA's with a few SEP phone thrown in. I started to look at the devices closer and reliazed that the same ATA's and SEP's are recurring. there may be 100 (Just a Guess) but they are the same MAC address.
I started to do a search for the devices in CM to find that they were no longer an active device in CM, like the phone or ATA has been deleted from CM but the are still an active devise trying to register.
See if you can narrow down the list of devices to only a few that are recurring.
P.S. are you running any 3rd party paging or 911 system using resources to the cluster.
I'm not sure whether this is relevant or not but:
We had the same problem yesterday. I setup some alerts using RTMT to email me when phones start to unregister.
About 2:00p I got an email stating phones were unregistering from the subscriber. I went from having 1508 phones registered on subscriber to under 400. In just a couple minutes phones started to re-register with subscriber again.
As a result, I had a bunch of unregisters on subscribers, some on publisher and transient connections on publisher becase subscriber wasn't down long enough. It's almost all 7905's that have this problem, but there have been a few cases of 40/60's.
So it looks like there is a component hardware/software that is starting to fail on the Subscriber causing the issues I'm having.
I should be on the latest phone loads so I'll double-check and upgrade as necessary.
In terms of paging or 911, we had been using Berbee's Informacast application as a trial so I set the CCM settings back to defaults after the trial expired. Our 911 routes directly out a 1760 on a dedicated centrex line so that should be fine.
I know it's hard to manage, but do you know what happens in your network when (or before) your phones start unregister? If you can get a pattern of events, would you put a sniffer on the network and see what's going on in the network?
Do you have the Security Agent on your Callmanager? Do you have some kind of IDS sensor in your network? May be you should look there if you have any unusual activity i.e. excessive broadcast or multicasts, spread connections, etc. Nothing happens without a reason...
CCNP, CCDP, CCSP, Cisco Voice, Security+, MCSE W2K, etc.
we also had this with 4.1.3 sr1. hwta we found that phones & primary rate channels would also de-register. If we did a no mgcp & then mgcp on the gateway, all recovered OK. We actually rebuilt the cluster as we only had 2 servers & all is OK as long as we don't do any major changes on the network. Take a note of the servers running at a high CPU 90-100 % all the time & consequently of that, users were experiencing of slow response from the phones. i.e. phone would ring & be picked up, but takes 10 seconds before voice streams connected.
shit, its monday and it happened to our telephone system too. Almost all devices reseted and I can agree with previous posters, it doesnt seem to be a dhcp lease expire thing, because my phone had different leases. we are using 4.1(3)sr1, and it is the first time to see this behaviour. We just upgraded our system from 4.0 two weeks ago and had never before this kind of behaviour. I am using the latest phone load and the vlan configuration is in order, also I can not find any root cause in the event viewer (only seeing the phones unregistering) or in the traces. Anybody has an idea what the root cause is, or is this know bug? Any fixes or patches available?
Any help is appreciated, management almost ripped my head off,
Im getting this issue as well. I have many phones unregistering with status code 8.
Has any one found a solution?
One thing I noticed the ip address of the phone changed after it re registered. Im wondering if its DHCP related...
CCM 3.3.3 SR4a.
Exhibiting the same issues. Numerous phone deregistrations all over the network. Most seem to be 7905s. While researching the issue came across the "Standard Practices" (SRND) guide which states in part:
... confining a VLAN to a single access
layer switch also serves to limit the size of the broadcast domain. There is the potential for large numbers of devices within a single VLAN or broadcast domain to generate large amounts of broadcast traffic
periodically, which can be problematic. A good rule of thumb is to limit the number of devices per
VLAN to about 512. (Have seen other lower recommendations on this forum of 256)
Typically, a VLAN should not span multiple wiring closet switches; that is, a VLAN should have presence in one and only one access layer switch.
My current vendor supplied configuration is a SINGLE voice VLAN with over 1000 devices!
If I redesign the network to follow these guidelines,which VLAN does the CCM publisher reside? Subscribers?
I have 18 wiring closets with multiple 3550s in the HQ alone. I also have two of the call managers in one building and one in a different building.
same issue w/ a 50 phone, multi-site (CM 4.1) and its not a QoS thang..they are about ready to pull it out. CSCO no assist?, not a resource issue on CM either.
Attempting a rebuild. Wow..this is not fun!
Oh..note: happens at 04:00am with no traffic on the freaking LAN segment where CM resides.
DHCP Lease infinite..etc..etc..
I have the same issue using CM 4.1.3(sr1). 19 Ip Phones 7960 unregisters all at the same time. Ussually at 4pm or 5pm when there is not much activity at the site. This are the Reason code 8 events:
02/06/2006 17:42:28.587 CCM|DeviceUnregistered - Device unregistered. Device name.:SEP0013193E52AB Device IP address.:172.16.x.x Device type. [Optional]:7 Device description [Optional].:Description Phone Reason Code [Optional].:8 App ID:Cisco CallManager Cluster ID:1902-Cluster Node ID:172.17.x.x|<:1902-CLUSTER><:172.17.X.X><:ALARMSEP0013193E52AB><:SEP0013193E52AB>
The description for Event ID ( 3 ) in Source ( Cisco CallManager ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: DeviceUnregistered - Device unregistered.
Device name.: SEP0012D9B95154
Device IP address.: 172.17.x.x
Device type. [Optional]: 7
Device description [Optional].: Sistemas
Reason Code [Optional].: 8
App ID: Cisco CallManager
Cluster ID: 1902-Cluster
Node ID: 172.17.x.x
Explanation: A device that has previously registered with Cisco CallManager has unregistered. This event may be issued as part of normal unregistration event or due to some other reason such as loss of keepalives.
Recommended Action: No action is required if unregistration of this device was expected..
I think we are hitting a bug in 4.1.3 sr1. Any confirmations from Cisco???
I ran into this problem since last year with CCM 3.3, but not until few months ago that I discover what the problem is. Our helpdesks people once a while decide to use Norton Ghost to ghost image of the desktop/laptop. When they run this program, it basically tear generate ton of multicast traffic on the network that cause all the phones to randomly disconnect from the CCM. When the program stop, the phones stay active all the time. My recommendation to you is to run any program like this ghost off from an isolated network to prevent this problem.