Troubleshooting Translation Routes to VRU

m.shchekotilov · ‎02-05-2010

I've got a problem at the Customer's IPCC Enterprise 7.2.

About 2,5% of calls in the welcome script fail at the Translation Route to VRU step (basically one of the first steps in the script) and a translation route time-out event is shown in Router window. CCM is the routing client and there are 2 IP-IVRs which are configured for load-balancing.

Any tips for troubleshooting the problem? The thing is I can't even identify the failing calls to look them up in the PG's opc process output and check what they might have in common.

Mike

dchumbley · ‎02-05-2010

To help identify the failed calls use a Set Node to set a peripheral or call variable on the failure path from the Trans Rte node with something like "trans rte fail" or something that signifies that call failed the Translation Route. Then you can query RCD and TCD for those records to find the Call ID, RouterCallKey and such to search the logs with.

geoff · ‎02-05-2010

You need to tie together the request from the CUCM for the trans route to what happens in the Call Router and what happens on the VRU. The key to this is the Dialog ID. This is a bit hard to explain and I may not have this exactly correct - but the advice is free. ;-)

Use rttrace to turn up tracing modules on the Call Router - route requests, trans routes, queuing - the usual suspects. Turn up trace on the OPC process(es) on the CUCM PG and the IPIVR PG. I can't remember the exact settings.

What you need to see on the OPC is the request for the trans route and the Dialog ID and the response on the other OPC. Then you match the Dialog ID to the Dialog ID in the Call Router and look at what happens. If you see the request, and the dialog ID, and about 10 seconds later the same dialog ID time out and the pending trans route removed, you have found a cuplrit.

Now you need to think about how this works.

The CUCM PG wants to do the trans route so it asks (through the Router) the destination PG to supply a number it can call (one of the trans route DNIS in the pool). Once it gets this, the destination PG is told to expect a call on that number and when it gets it, here is the associated data. A timer is set. The transfer has to occur within this time.

If there is a hold up on the CUCM side and the timer goes off, the call is lost. This is unlikely though, but I've seen it happen from VRU peripherals.

Do you have enough DNIS (route points) in the pool. There are many guidelines on this, but what I do is make as many RPs as I have CTI ports for the trans route final destination. Then you can be guaranteed you will never get into trouble. When a DNIS is used for the trans route it is normally held up for half a second or so. If a trans route times out, that DNIS is held up for 10 seconds - and this may snowball if the pool is too small.

Tracing trans route errors is one of the hardest things to do.

Regards,

Geoff

m.shchekotilov · ‎02-08-2010

Thanks, Geoff

Ok, I could find the calls in the OPC trace. It's a Generic PG for both CCM and IVR, so everything is in one OPC process. I'm attaching an output of a bad call from the OPC process.

So, I'm not getting any reaction from IVR on this call - the TR label for CCM was generated (8206), but nothing came from IVR. Guess, I'll have to check CCM CDRs for those calls.

Another thing that bothers me is that I constantly see messages "pg1A-opc Peripheral MODE changed from PRIMARY_PROCESSOR to BACKUP_PROCESSOR" and then a switch to primary again. It switches back and forth every 3-4 minutes, but I couldn't find any reaction in other processes. It doesn't seem normal, does it?

m.shchekotilov · ‎02-09-2010

So, I've tracked the failing calls down and the problem is with Route Points - the calls are failing because CCM drops them. The call does not even get to the CTI Port, but is dropped because the RP is busy. The same RPs are used successfully in other calls, so it's not a RP configuration problem.

Do you have enough DNIS (route points) in the pool. There are many guidelines on this, but what I do is make as many RPs as I have CTI ports for the trans route final destination. Then you can be guaranteed you will never get into trouble. When a DNIS is used for the trans route it is normally held up for half a second or so. If a trans route times out, that DNIS is held up for 10 seconds - and this may snowball if the pool is too small.

Well, they have 2 IVRs with 20 ports each and there are 40 RPs created for each IVR (80 in total). I though that will be enough, so I'm now puzzled with those busy RPs.

m.shchekotilov · ‎02-08-2010

Ha, I did exactly that in the very beginning, but later became so frustrated with this problem, that I completely overlooked the variable. Now I can find the calls. Thanks for the tip. )

jpsweeney77 · ‎02-05-2010

One thing you may want to try first is to run a SQL query on TCD for your IVR peripherals and confirm that you are receiving volume on each of the configured DNIS in the translation route pool for each peripheral. It could be as simple as a misconfigured Route Point in UCM. Cross your fingers that it is. If not, as Geoff says, it's quite cumbersome to troubleshoot.

m.shchekotilov · ‎02-08-2010

Well, I'm pretty sure the CM RP are fine, but I'll look into it. As far as I understand I need to check if there exist connected calls for every DNIS in the TR pool? I'm not quite good at SQL, but I'll se what I can do. Thanks!

m.shchekotilov · ‎02-11-2010

I've found the source of the problems - it is the second IVR. All the timeouts happen because it rejects some calls. First IVR, on the other hand, doesn't do that.

Most of the rejects have the reason code/contact disposition of 7 (Trigger Timeout), but there are some with 5 (No Trigger), 10 (Remote Timeout), 12 (Trigger Max Session) and 13 (Trigger Failed). Anybody have any ideas about why it could behave like that?

Edward Umansky · ‎02-11-2010

Review the mivr logs on the IPIVR server that is having the issue. Track down the specific calls that are failing and you should see a bit more information on what is going on. If you see the same route points causing errors over and over again, double check those route points and make sure they are correctly configured. I would also do a quick sanity check of all your configurations: make sure every trans route number is accounted for as a trigger to the trans route IVR application, check that your jtapi subsystem is properly synchronized, check for any subsystems in partial service, etc.