Re: PRI Drops - Gateway un-registers/re-registers

wade.hanson · ‎06-10-2010

Greetings,

I am having an issue with a Cisco 2821 router that seems to loose it's connection to CCM. The backhauling for layer 3 of the PRIs stops and the gateway seems to un-register. In the Real Time Monitoring Tool I see "NumberOfRegisteredGatewaysDecreased" and then "NumberOfRegisteredGatewaysIncreased" a few seconds later.

To eliminate this as a carrier issue there are currently two PRIs on the router. They both drop and go out of service simultaneously. The issue occurs with just one PRI configured as well.

The IOS version on the router is (C2800NM-ADVIPSERVICESK9-M), Version 12.4(9)T7
The CCM version is 6.1.3.3203-1

The debug output is as follows and I am attaching a file with extended output

Jun 10 14:28:25.114: cmbh_tcp_rdhdr: TCP link to x.x.x.x closing...
Jun 10 14:28:25.114: cmbh_tcp_fini: TCP link to x.x.x.x closed -- calling callback
Jun 10 14:28:25.114: cmbh_remove_link: Freeing link record with address 49DFDBF0 for x.x.x.x.
Jun 10 14:28:26.846: %ISDN-6-LAYER2DOWN: Layer 2 for Interface Se0/0/1:23, TEI 0 changed to down
Jun 10 14:28:26.854: cmbrl_send_pak: No active CCM -- ignore packet
Jun 10 14:28:26.854: ISDN Se0/0/1:23 Q931: L3_ShutDown: Shutting down ISDN Layer 3
Jun 10 14:28:26.854: cmbrl_send_pak: No active CCM -- ignore packet
Jun 10 14:28:26.858: cmbrl_send_pak: No active CCM -- ignore packet
Jun 10 14:28:27.546: cmbrl_config_changed: idb=Serial0/0/1:23,set_or_cleared=0
Jun 10 14:28:27.546: cmbrl_clear: cmbrl_ptr=49DC3100
Jun 10 14:28:27.546: cmbh_notify_func: event = 0
Jun 10 14:28:27.546: cmbh_notify_func: Se0/0/1:23: Removing CCM Backhaul binding
Jun 10 14:28:28.766: %ISDN-6-LAYER2DOWN: Layer 2 for Interface Se0/0/0:23, TEI 0 changed to down
Jun 10 14:28:28.774: cmbrl_send_pak: No active CCM -- ignore packet
Jun 10 14:28:28.774: ISDN Se0/0/0:23 Q931: L3_ShutDown: Shutting down ISDN Layer 3
Jun 10 14:28:28.774: cmbrl_send_pak: No active CCM -- ignore packet
Jun 10 14:28:28.774: cmbrl_send_pak: No active CCM -- ignore packet
Jun 10 14:28:28.830: %LINK-5-CHANGED: Interface Serial0/0/1:23, changed state to administratively down

Here are the pertinent parts of the configuration:

network-clock-participate wic 0
network-clock-select 1 T1 0/0/1

controller T1 0/0/0
framing esf
linecode b8zs
pri-group timeslots 1-24 service mgcp
!
controller T1 0/0/1
framing esf
linecode b8zs
pri-group timeslots 1-24 service mgcp
!

interface Serial0/0/0:23
no ip address
encapsulation hdlc
isdn switch-type primary-ni
isdn incoming-voice voice
isdn bind-l3 ccm-manager
!
interface Serial0/0/1:23
no ip address
encapsulation hdlc
isdn switch-type primary-5ess
isdn incoming-voice voice
isdn bind-l3 ccm-manager
no cdp enable
!

voice-port 0/0/0:23
!
voice-port 0/0/1:23
!

mgcp
mgcp call-agent x.x.x.x 2427 service-type mgcp version 0.1
mgcp dtmf-relay voip codec all mode out-of-band
mgcp rtp unreachable timeout 1000 action notify
mgcp modem passthrough voip mode nse
mgcp package-capability rtp-package
no mgcp package-capability res-package
mgcp package-capability sst-package
mgcp package-capability pre-package
no mgcp timer receive-rtcp
mgcp sdp simple
no mgcp fax t38 ecm
mgcp rtp payload-type g726r16 static
mgcp bind control source-interface GigabitEthernet0/0.100

Any help and advice would be appreciated!

Thank you.

paolo bevilacqua · ‎06-10-2010

Try switching to H.323

Much less bugs, much more features.

William Bell · ‎06-10-2010

Well, you don't have to run to H.323 just yet. At least not until you have better reason than a one liner on the netpro forum...

Anyway, the gateway is losing connectiong to the CUCM for some reason. Whether it is a software bug or not is TBD at this time. I have not checked that myself due to time constraints. Some things to consider:

1. When did this start or has it been happening since you implemented? If it is a recent thing, what changed?

2. Have you reveiwed the CUCM application logs and traces for the time period in questions. You will need these to get a better picture of what is going on. In particular, are any other devices exhibiting issues? When the device unregisters in the application log, you will find reason codes. Sometimes they are helpful. Usually they can help narrow the fault domain.

3. Check the network interface from the gateway to the attached switch or WAN. Look for interface up/down hits, CRCs, errors, etc. Check on both sides of the link if possible. If this router connects to a LAN environment, check uplink interfaces for hits, errors, etc.

4. Does this happen when no calls are occurring?

5. Does this happen only on ingress or egress calling?

A word on bugs and protocols and IOS. There are plenty of bugs to spread around. There almost like opinions in that way.

HTH.

Regards,
Bill

HTH -Bill (b) http://ucguerrilla.com (t) @ucguerrilla

Please remember to rate helpful responses and identify

paolo bevilacqua · ‎06-10-2010

I have found that to placate irate customers having their calls dropped, one line remedies work better and faster than lengthy investigations.

djh278778 · ‎06-10-2010

I would not eliminate the carrier yet just cause there are two circuits. They more than likely both ride into the building under the same cable sheath, if there is a problem with the cable (or other shared facilities along the way) you could get this problem. Also, the PRI's share a 2MFT card. Maybe it can be changed. I agree with the other post to look at errors on the interfaces but focus on the controllers that terminate the PRI's.