Devices Unregister on CCM!!!!

Unanswered Question
Jan 17th, 2008

Hi,

Recetly there have been a few complaints from users that sometimes when they make calls, the calls just drops out.

I then opened CCM traces and I found out that devices are unregistering with the CCM...My event viewver if filled up with these messages

I then looked at the event viewer and I was alarmed to find out that lots of devices are unregistering with CCM even my MGCP endpoint.

These are a few messages from the event viewer...

Error: DeviceUnregistered - Device unregistered.

Device name.: SEP0013C35A960B

Device IP address.: 192.168.105.36

Device type. [Optional]: 8

Device description [Optional].: SEP0013C35A960B

Reason Code [Optional].: 8

App ID: Cisco CallManager

Cluster ID: StandAloneCluster

Node ID: 192.168.2.12

Error: DeviceUnregistered - Device unregistered.

Device name.: SEP000750833786

Device IP address.: 192.168.105.159

Device type. [Optional]: 8

Device description [Optional].: SEP000750833786

Reason Code [Optional].: 8

App ID: Cisco CallManager

Cluster ID: StandAloneCluster

Node ID: 192.168.2.12

Error: DChannelOOS - D channel out of service.

Device Name.: S0/SU2/[email protected]

Device IP address: 192.168.105.240

Channel Id.: 16

Unique channel Id: S0/SU2/[email protected]:16

Reason [Optional].: 0

App ID: Cisco CallManager

Cluster ID: StandAloneCluster

Node ID: 192.168.2.12

Explanation: Indicated D channel has gone out of service.

Recommended Action: Contact TAC for help if this alarm continue to generate..

Error: BChannelOOS - B channel out of service.

Device Name.: S0/SU2/[email protected]

Channel Id.: 8

Unique channel Id: S0/SU2/[email protected]:8

Reason [Optional].: 0

App ID: Cisco CallManager

Cluster ID: StandAloneCluster

Node ID: 192.168.2.12

Explanation: Indicated B channel has gone out of service.

Recommended Action: Contact TAC for help if this alarm continue to generate..

Error: BChannelOOS - B channel out of service.

Device Name.: S0/SU2/[email protected]

Channel Id.: 8

Unique channel Id: S0/SU2/[email protected]:8

Reason [Optional].: 0

App ID: Cisco CallManager

Cluster ID: StandAloneCluster

Node ID: 192.168.2.12

Explanation: Indicated B channel has gone out of service.

Recommended Action: Contact TAC for help if this alarm continue to generate..

Error: BChannelOOS - B channel out of service.

Device Name.: S0/SU2/[email protected]

Channel Id.: 6

Unique channel Id: S0/SU2/[email protected]:6

Reason [Optional].: 0

App ID: Cisco CallManager

Cluster ID: StandAloneCluster

Node ID: 192.168.2.12

Explanation: Indicated B channel has gone out of service.

Recommended Action: Contact TAC for help if this alarm continue to generate..

Error: DeviceUnregistered - Device unregistered.

Device name.: MTP-IPCS3825

Device IP address.: 192.168.105.240

Device type. [Optional]: 112

Device description [Optional].: MTP-IPCS3825

Reason Code [Optional].: 9

App ID: Cisco CallManager

Cluster ID: StandAloneCluster

Node ID: 192.168.2.12

Error: DeviceUnregistered - Device unregistered.

Device name.: S0/SU2/[email protected]

Device IP address.: 192.168.105.240

Device type. [Optional]: 121

Device description [Optional].: [email protected]

Reason Code [Optional].: 8

App ID: Cisco CallManager

Cluster ID: StandAloneCluster

Node ID: 192.168.2.12

Please what do I need to do wo resolve this.

what does optional code 8 refers to with regards to the IP phones reason code.

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 3.5 (2 ratings)
Loading.
Ayodeji oladipo... Thu, 01/17/2008 - 12:12

Rob,

Happy new year to you!

Thanks for this invaluable information.

I will dig deeper to see why this is happening. It is really affecting users.

Should have any further ideas please do nto hesitate to share with me

William Bell Thu, 01/17/2008 - 11:59

Rob provided the info you need to understand reason codes. What you need to determine is:

1. Initial event. Look at the earliest event in your event logs/traces and see if you can determine when this started to be an issue. Note that you may find the buffer is flushed due to volume of messages.

2. Isolate. Depending on the size of your network you may will want to identify the affected node groups/subnets to see if you can isolate the issue to a specific subnet or intermediate network connection.

3. System Log. Look in the system event log for any abnormal failures of subsystems

4. History log. Check Cisco install history log (\program files\common files\cisco\logs\history.log) to check recent upgrades. See if event horizon correlates to upgrade action.

5. Who is not affected. Ties into isolation steps. Identify if you have nodes that are not affected by the issue.

One interesting piece of information you provided is that the "call just drops out". Does the user see a Temp Fail message on the LCD at that time? Do you see Skinny alarm messages (Station Event alerts in App event log) in your trace or event log? Based on the limited info provided, there is a possibility your issue is with the gateway or on the network between the gateway and other nodes. The reason I suspect this is because if the CUCM host was the problem your call should stay up with the MGCP gateway during the event. But if the gateway was the issue, your call would drop and your phone would re-register.

Regards,

Bill

Ayodeji oladipo... Thu, 01/17/2008 - 12:56

Billy,

Thanks for your response.

However, from my troubleshooting, I came to the conclusion that the problem lies with callmanager or the connection between CCM and IP Phones ( this is over a LAN extension)

These are my findings:

1. Using the q.931 isdn translator, I observed that the gateway terminated the call with a cause code of 90 (normal call clearing)

2. I then went into the CCM trace details, and I found out that during the call, the IP phone unregisterd with CCM. The call manager then initiated a Closed channel request.

3. After this the CCM then told the MGCP gateway to tear down connection for the call.

This is the reason why the cause code from the gateway was normal call clearing..

The event logs is filled up with IP phones unregistering with a reason code of 8 and 9...Which implies 8: DeviceInitiated Reset and 9:Callmanager reset

Hence it is obvious that something is happening betwwen Callmanager and IP Phones.

One possible suspect is the loss of keepalives....I say this necause at some point today the message CCM down showed on one of the IP Phones but lasted only a few seconds...

Is there snything I can use to troubleshoot keep alives between Callmanager and IP phones.

This issue is not happening in the sites where the IP Phones are local to the callmanager....

jbarcena Thu, 01/17/2008 - 15:26

Well the best way to troubleshoot keepalives is with a sniffer trace, you could use a program like the ethereal on a PC that is connected behind an IP Phone that is having the problem, but you will have a lot of information if it is not that common.

You could also try increasing the keepalive time on the CCM service parameters.

HTH

//Jorge

Ayodeji oladipo... Thu, 01/17/2008 - 15:54

I have narrowed the problem down to this:

CCM-Aborted-TCP Connection...

Callmanager is aborting TCP connection with the IP Phones hence the IP Phones re-initialize ..

What can I check...

jbarcena Thu, 01/17/2008 - 16:00

That usually happens when CCM does not receive three keepalives from the IP Phones, I recommend you to check the connectivity between the phones and the CCM server.

Also upload the CCM trace from the time of the problem to see what else do you have and the MAC address of the phone that got unregistered.

rob.huffman Thu, 01/17/2008 - 22:12

Hi Deji,

This is always hard to pinpoint my friend, but from my viewpoint this looks like a Network problem like you said. The loss of keepalives almost surely points to this.

Maybe you can see what was happening on the Network at this time.

Rob

Zin.Karzazi Fri, 01/18/2008 - 00:35

like already suggested, your best bet is to use a sniffer (ethereal).

jbarcena Fri, 01/18/2008 - 08:12

I will need the trace in .txt instead of .xml also have you tried increasing the keepalive time for test purposes? what was the result?

Ayodeji oladipo... Fri, 01/18/2008 - 08:26

Thanks. I have done that and I did not see any noticeable difference.

I then checked on the interface connecting the IP phones to the CCM and I found out that there were lots of interface resets and collisions.

The interfcae is 100MB LAN extension link to the main office. I observed that the duplex was set to half on both ends of the router

I changed this on both ends to full, and I have noticed a remarkable difference.

I am still lookign at it.

Can I ask a question?

How do I know that keep alives are missed..

I used wireshark to trace keep alives and skinny..but I couldnt see anything indicating that keep alives were missing..

What do I need to look for in thetrace..

Unfortunately My traces are only setup to log in XML formats...So I cant provide the text formats

Thanks

jbarcena Fri, 01/18/2008 - 09:36

Well, with a sniffer trace on the back of the phone you can see when you send literally the keepalive message from the IP address of the phone to the IP address of the CCM server

Ayodeji oladipo... Fri, 01/18/2008 - 10:14

Thank you.

I see the keep alives, but I do not know when they are lost...

Thats why I am asking, if I can use the time or sequence number or anything to know when the keep alives are lost

Actions

This Discussion