I have a route list which points to 3 route groups. Here's the order:
First one is a PRI in Boston (Primary), second is a T1 in Boston and the tired one is a PRI in New York. As Configured; when first PRI is not available the T1 has to kick in and when none of the links in Boston are available the PRI in New York has to take the charge. The links are working individually but when I unplug the PRI in Boston, we cannot make any calls. It looks like the switch over is not configured properly. Is there any configuration that I am missing? Where else de I need to look and configure in order to assure the failover?
CallManager 4.1 and the gateway is MGCP. Could you please also tell me what would be the differnece in configuration if the Gatway was H.323?
I'll let Andy define te difference in H323 and MGCP, I believe it will not detect the down line if it is H323 so it will not bypass them.
For trouble shooting, Go to your LIst and change the order of the groups (Highest priority is first) then place a call and have a debug ISDN q931 running on the first gateway to see if it goes there. The change the order and see if it hits the new gateway at the top of the list. etc. etc.
This will verify the list is working properly, then look at the registration and availability of the gateways.
I did the test that you asked and the answer is YES the gateways are working when I put them individually at the top of the list but when I unplug the one in the top then we can't make any calls. It doesn't switch over to the second available gateway on the list.
I know I have had to verify code level up to 12.3(11)T10 a couple of times due to possible bugs in lower versions. Is it plausile to upgrade the code on the gateways' and see if the problem evaporates? This code rev seems very stable so it might be prudent to move to it at any rate.
Do you think the reason that the second PRI in the Route list doesn't kick in because of the Gateway's version? Could you please give me more detail how I can resolve this issue?
It sounds to me like something is not reporting back correctly wen it is unavailable. If you can individually force calls to the gateway in the RL and they are passed along fine, then make the gateway unavailable and it does not move on to the next list member then I would gather that the GW is not reporting back as unavailable. This may be bad code. Sounds like a bug. But you could also look at the ccm trace files at the time when you unplug the T-1 and see if it lists the unregistration of the GW. If not then it is not reporting back to CM.
then search for IP of GW
Have you receive your answer yet? Can you provide the model and IOS of your gateway? Are you using a gatekeeper? Do you have SRST deployed on those gateways as well? What is your CCM release?
-The Gateway is a Cisco 3825 with IOS Version 12.3(11r)T2, RELEASE SOFTWARE (fc1).
-Yes SRST is deployed on those Gateways
Is your CCM release 4.1(3)sr3a or else?
Silly question but I have to ask.Have you verify your route groupe (Current Route Group Members) to see if your device (T1 cas)is in there?
Can you post both Boston gateways config(the one with the PRI and the other with the T1 CAS?
I'm trying to reproduce your setup in lab.
Is your route list enable (check box selected?
Can you print screen your route list, route groupe and Gateway and end points from the CCM?
Does the problem exist since installation or occured after deployement?
I know this sounds weird, but have you reset the RL? I've had that fix some issues with gateways.
Do you see the Gateway unregister in the logs?
Yes I did reset the RL but no luck. I don't see the Gateway unregister in the log file. Either it's not in the Log file or I am not able to find it. Did you get the chance to look at the files that I posted in attachment? Do you see anythink wrong in the config?
I don't know if it was you but somebody was thinking it might be related to a bug with IOS.
When you unplug the PRI to test, do you see it as unregistered in the ccm admin stats? That will be easier than trying to dig it out of the trace - if you never see it unregister, then that is why it does not move down the list. You could try the CAS first, and see if it shows up as unregistered, and if the calls move down to the next member of the route list, but it will depend on the first member actually showing down. If they do not, then we know what the problem is, but not why the gateway is not unregistering...
Thanks Mary Beth,
Can I shutdown the PRI interface from the router instead of unplugging the cable? Would it do the same?
Let's say I shutdown or unplug the cable and the CallManager still shows registered. What can I have to do to resolve this issue?
And if it shows unregistered but it does not move to the next available, what could be the problem?
Mary Beth has a good idea there. Launch the Real Time Monitoring tool and see if the gateway unregisters when unpluged. If you haven't used it before, you can find it in the plugins page of CCMADMIN. It is slow the load the first time and at first appears to not move but let it sit for a minute and you will instantly see how it works.
Good one Mary Beth, this is much better than my Trace file idea.
If it does show unregistered then grab the CCM trace file for that time period and see if there is any reason given. If it does not unregister then it could be a bug or something is blocking the MGCP communication from the gateway to CM. You could try running the "no MGCP" command and verify that it does not unregister in this manner either. You can also have the gateway registered, login to the gateway have term mon on and then reset the gateway from CallManager, it should give you the message "Building Configuration......" which would verify that CM can indeed talk and reset the gateway on it's own.
I think you should be able to just shut the controller out on the router and see it go down, and if the d channel is truly backhauled, the CM should see the gateway unregister, since it can no longer possibly be talking to the PSTN switch -but I have seen a few times lately where this does not seem to happen. It has got to be a bug, but don't know if it is on the router or CM side. There is a sh ccm backhaul command that will show the tcp session, I would expect that to go down if the d channel goes away, I think, and there is a debug ccm backhaul that should show the messages being sent back - the doc I have seen describing this shows facilities messages, etc, but I would expect to see teardown and setup messages too. Then first thing to prove is that the gateway does not register. Then try the CAS, and if shutting that down DOES cause the gateway to unregister, then make that first in the list, and verify that the failover works that way, with a valid status.
Also, one thing I noticed in the router config, you have dial-peer 101 pots configured, putting mcgpapp on the the PRI - the interop guide says you should not do this:
Restrictions for MGCP PRI Backhaul and T1 CAS Support
?Voice interfaces on the NM-HDA and the AIM-VOICE-30 are not supported.
?Integrated access, in which the channels on a T1 or E1 interface are divided between a group used for voice and another group used for WAN access, is not supported when voice is controlled by Cisco Unified CallManager through MGCP.
?T1 and E1 protocols, such as QSIG, E1 R2, T1 FGD, and PRI NFAS, are not supported with MGCP only with H.323.
?E1 CAS is not supported.
?Do not add the application mgcpapp command to dial peers that support PRI backhaul.
That used to be generated when you downloaded the config from CM, and then it went away, and I know we put it back in a few places, but now they specify that you should not. I wonder if you removed that and did a ccm config check reset on the router to redownload config if it would behave better - that is, if you determine that it is really not unregistering.
Thanks Mary Beth,
I shutdown the PRI interface from the router and the CallManager showed the gateway unregistered. But still can't make any call. It doesn't failover the T1 which is the second on the Route List. Could you please advise?
it seems the CM's are a bit off. The log starts at 00:33
So, after looking at the RG's and RL's, I am always a big fan of simplifying then building up until you hit the problem.
Can you build an RL with just the Boston RG in it, then verify that the failover will go between T1's in the same router. Then if yes, add the Menlo Park to the bottom and see if it breaks there. But, if it breaks between the Boston T-1's, then reverse the order and start with Menlo Park and see if it will fail over to Boston and down the Boston T-1's
There is an error in the CCM2 but I can't find any references to it. Maybe someone else will have better luck
" ERROR Too many calls to decrementTotalNumberOfRegisteredCallingEntities, Already at zero. |<:STANDALONECLUSTER><:220.127.116.11>"
I did see the gateway unregister in the Trace.