Inter-cluster call problem CUCM 4.2

Unanswered Question
May 10th, 2010

I have an inter-cluster non-gatekepper controlled call problem.

Every time the costumer tries to call site B from site A, the first attempt fails. but if he redials immediatly after the call is successufuly.

Could you tell me if this could be any problem with timers?

Does any one had this issue?

Thanks in advance.

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
William Bell Mon, 05/10/2010 - 10:29

I had a problem like this a while ago on a 4.1 CUCM system. I am not sure if the resolution would work for you or not. I will use an example to explain my issue. Hopefully, it will make sense.

ClusterA:

1. CUCM nodes: 1.1.1.1 and 1.1.1.2

2. CUCM Group:

- Name GROUPA

- Members (1) 1.1.1.1 (2) 1.1.1.2 (in this exact order)

3. ICT Trunk

- Name ICT-ClusterB

- CUCM group (via DP) GroupA

- Targets (1) 2.2.2.2 (2) 2.2.2.1 (in this exact order)

ClusterB:

1. CUCM nodes: 2.2.2.1 and 2.2.2.2

2. CUCM Group:

- Name GROUPB

- Members (1) 2.2.2.1 (2) 2.2.2.2 (in this exact order)

3. ICT Trunk

- Name ICT-ClusterA

- CUCM group (via DP) GroupB

- Targets (1) 1.1.1.1 (2) 1.1.1.2 (in this exact order)

The problem I had was that every other call would fail from ClusterA to ClusterB. The reason was that the CUCM group assignment on the ICT (as defined on ClusterB) did not match EXACTLY the IP address/target list of the ICT as configured in ClusterA. In the above example you can see that ICT-ClusterB will use 2.2.2.2 first and then use 2.2.2.1. If you look at the ICT-ClusterA on ClusterB you will see that the CM Group "GroupB" does not have the same list.

Now, one could say "no way" but I worked with TAC on this issue and did all of the obvious things with no joy. When I made sure that the CUCM group and ICT target lists lined up 1:1, everything worked.

A similar issue could also arise if the target list specifies a remote-cluster node address that is not part of the ICT trunk's CM group on the remote cluster. In our example:

ClusterA:

1. ICT Trunk

- Name ICT-ClusterB

- CUCM group (via DP) GroupA

- Targets (1) 2.2.2.2 (2) 2.2.2.1 (in this exact order)

ClusterB:

1. CUCM Group:

- Name GROUPB

- Members (1) 2.2.2.1

2. ICT Trunk

- Name ICT-ClusterA

- CUCM group (via DP) GroupB

Take a look at the ICT and CM group configs on both sides of the ICT.

HTH.

Regards,

Bill

Please remember to rate helpful posts.

david-lima Mon, 05/10/2010 - 11:50

Hi friend, try restarting the CallManager service in both servers.

hth

David

Jose Goncalves Tue, 05/11/2010 - 02:01

Hi Bill.

Thank you for yoru answer.

I had the IP's for the trunk different from the ip's of the cm group.

i changed them, but the problem persists

After bussiness hours i am going to trie to restart the callmanager service in both clusters. But if you have any other idea i would take it gladly.

best regards

JG

William Bell Tue, 05/11/2010 - 04:50

After you modified the CM group and/or ICT did you reset the trunk on both sides of the connection? Of course, if you did a complete CM service reset then that will reset the ICTs. But that is up to you. Resetting the ICTs is quick and painless, no change control needed. Worth a try.

If you want to narrow the problem down further then you could load RTMT or perfmon and monitor the h323 active calls on the CM clusters in question. Place calls and identify which remote ICT member is taking calls and which is not. Outside of what we have already discussed, if you have a situation where there is an IP path connectivity issue to one of the servers (firewall, ACL, etc.) then the symptoms you describe are possible. Also, what is the observed behavior when site B calls site A.

HTH.

Regards,

Bill

Jose Goncalves Tue, 05/11/2010 - 07:23

Hi Bill.

Well every time i change anything in the ICT i make a reset on both end's.

When i used RTMT it seems that the call is always made in the same way. The second CUCM never gets any calls.

Jose Goncalves Wed, 05/12/2010 - 03:08

I made a restart to the callmanager service in every cluster and server, but the problem remains.

Does anyone have any idea to solve this?

In my latest test the behaviour was: 1 trie misses always, 2 time works. But if you let about 20 seconds or so you have to make 2 tries again.

It seems that at the first attempt the calling CUCM doesn't have the ICT established... Does this make any sense?

Thanks

JG

William Bell Wed, 05/12/2010 - 04:51

JG,

Is there any possibility that you have an overlap in the dial plan on the two CUCM systems?  The symptoms you describe are very similar to what would be seen if you had a loop occurring on your H.323 trunk.  You can confirm this by checking the event logs on your 4x system to see if you see "ICTCallThrottling" errors.  There would be a "ICTCallThrottlingStart" and "ICTCallThrottlingEnd".  You would want to check both sides of your ICT because only one of them would initiate the throttling.

The throttling would occur if a call setup message bounces or loops between the two systems.  The most common reason is that you have an abbreviated dialing solution whereby you have a default route pattern on ClusterB that points to ClusterA.  In addition, you have a default route pattern on ClusterA that points to ClusterB.  Example:

ClusterA has pattern [2-8]xxxx that points to ClusterB

ClusterB has pattern 8xxxx that points to ClusterA

The extension 88888 does not exist on either cluster but is dialed by a user.  The call will loop until some mechanism throttles it down.

The throttling mechanism can be fine tuned with the Call Manager service parameter TimerH225ICTCallThrottle.  But this is just tweaking the behavior when the throttling occurs.  You can disable the call throttle but that would be bad.

The resolution is to fix the dial-plan loop by either leveraging CSS/partition arrangements or assignming more specific patterns.

Take a look at your event logs and double check your dial plan just to be sure.  Let us know what you find.

HTH.


Regards,
Bill

Jose Goncalves Wed, 05/12/2010 - 08:31

Hi Bill.

Thanks again for your answer it has made me change somethings in the dialplna, but the problem remains.

New dialplan summary:

Site A - 5442.XXXX is routed to the ICT and i am discarting the preDOT

Site B - 5XX[013-9]XXXX is routed to the ICT and i dont discard any digits. (this was 5XXXXXXX)

               In this site i have also a translation pattern 5442.XXXX to discard the preDOT ( i don't know why this is in here, not my config)

Changed the site A route pattern to send the 8 digits number so that the translation pattern in site B was responsible to discard the digits, but even so the problem persists.

in the event log's and in the RTMT i didn´t find any callthrottling in the tests i made.

IP routing problem could be it, but in that case, just the first attempt would fail independently of the dialed number.

Still in the dark...

thanks

JG

William Bell Wed, 05/12/2010 - 12:28

Jose,

Have you checked to see if the issue follows one CUCM node or one particular call path.  Meaning, can you set up separate ICT trunks like this:

ClusterA

node1: 1.1.1.1

node2: 1.1.1.2

ClusterB

node1: 2.2.2.1

node2: 2.2.2.2

On Cluster A:

TrunkA-to-B

- Target: 2.2.2.1

- CMGroup: ClusterANode1Only

On Cluster B:

TrunkB-to-A

- Target: 1.1.1.1

- CMGroup: ClusterBNode1Only

Place a bunch of calls.  Results?  Switch it:

On Cluster A:

TrunkA-to-B

- Target: 2.2.2.2

-  CMGroup: ClusterANode2Only

On Cluster B:

TrunkB-to-A

-  Target: 1.1.1.2

- CMGroup: ClusterBNode2Only

Place a bunch of calls. Results?

When you test, make sure that you go in both directions and make sure that you use the same stations.  Depending on your network layout, you may be able to do this with two phones side-by-side.

The point is if you see that calls are equally screwed up in all scenarios.  You have something in the network going hiccup on a very regular interval OR you have something in the dial plan.  I suppose there could be a defect.  But I have relayed all of the usual fixes I am aware of.  Could also be something really simple that isn't coming through via forum posts.

You may need to grab some CM traces to drill into this even more.

HTH.

Regards,
Bill

Please remember to rate helpful posts.

Actions

This Discussion