cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3616
Views
0
Helpful
8
Replies

UCCX HA cluster failover issue - when UCCX pub is offline CAD login error about JTAPI results

Brett Hanson
Level 1
Level 1

Hi,

Our system:

CUCM publisher running 9.1.2.10000-28 

CUCM subscriber running 9.1.2.10000-28

UCCX publisher running 9.0.2.11001-24 aka 9.0(2) SU1

UCCX subscriber running 9.0.2.11001-24 aka 9.0(2) SU1

 

We were performing some DR testing on the weekend and ran into an issue with CAD launch.

When CUCM pub & sub are up, UCCX sub is up, UCCX pub is DOWN, when attempting to run CAD we get the prompt to login.

It accepts our login and starts going through the typical loading process but then stops with the error:

"Login failed due to a configuratoin error with your phone and JTAPI or Unified CM. Contact your administrator [Retry][Cancel]"

When we re-connect the UCCX pub to the network, we no longer receive that error and CAD launch/login works as expected.

 

Things that I tried:

  1. Waiting a long time in order to allow any failover processes and services to do what they had to - made no difference.
  2. Our SDA's use Extension Mobility so in order to rule out EM I configured up a phone and allocated to a user directly (removing their device profile entirely from CUCM) - retried the DR scenario and experienced the same issue. Works when all servers are up, fails when UCCX pub is down.
  3. Tried "Cisco JTAPI Resync" and retried the DR scenario - no difference issue persists.
  4. Tried using different PC's - issue persists.
  5. Confirmed that all services on CUCM servers & UCCX sub were IN SERVICE or RUNNING when UCCX pub was offline - they were.
    1. Interestingly - UCCX sub took a very long time (5+ minutes) to let me view services status from either serviceability or CLI when pub was offline.
  6. Ran the Client Configuration Utility AFTER both servers were online when the system was built. * this point was added via an edit due to comment posted reminding me about it.

    1. I also re-ran it a few times on the night between DR tests and from different machines to see if it was related. (I'll edit my original post to include this).
      It is worth noting that when UCCX pub was taken offline - it took about 5 minutes before it allowed login but then failed with that error.
      During the failover process the login option wasn't available at all and after said 5 minutes the prompt came up and appeared to authenticate without issue before throwing the JTAPI error mentioned.

 

Does anyone have any thoughts or know why this would be?

8 Replies 8

Samuel Womack
Level 5
Level 5

First thing comes to mind when I see this is: Did you Run the Client Configuration Utility after you installed the 2nd Node or before..if you did it before installing the Sub and not after..then you need to re-run that Utility...

Hi, thanks for your comments.

In response, yes I ran the Client Configuration Utility after both servers were online.

I also re-ran it a few times on the night between DR tests and from different machines to see if it was related. (I'll edit my original post to include this).

It is worth noting that when UCCX pub was taken offline - it took about 5 minutes before it allowed login but then failed with that error.

During the failover process the login option wasn't available at all and after said 5 minutes the prompt came up and appeared to authenticate without issue before throwing the JTAPI error mentioned.

Do you think that the JTAPI error is indicative of an authentication failure? I got the impression authentication worked but then some device control side of things was letting us down.

I give you that much..failover times are slow..the only other thing I can think of (before heading off to sleep) is checking to ensure the 2nd jtapi_2 user "looking" like user jtapi_1..

Thanks again.

As this testing was done over the weekend and won't be done again for a while, there's no urgency so enjoy your sleep :)

As for my findings on the UCCX_JTAPI_1/2 account comparison in CUCM administration - application users:

Both are in groups "Standard CTI Enabled" and roles "Standard CTI Enabled".

For "controlled devices" - both are identical with respect to the triggers they control.

They are different, however, with respect to the IVR ports (call control group CTI RP's)

E.g.

UCCX_JTAPI_1 has CTI_2200 through to CTI_2289

whereas...

UCCX_JTAPI_2 has CTI_2300 through to CTI_2389

As these CTI Ports are the "IVR ports" in my Call Control Group - one lot configured in device pools registered with CUCM publisher (CTI_2200-2289) and the other lot to CUCM subscriber (CTI_2300-2389), I would've expected both UCCX_JTAPI_x users to control them all?

As both CUCM servers were up at the time of testing, do you think this would have any bearing?

The JTAPI users aren't really relevant to this problem - just if you are having issues with incoming calls after the failure. It's the RMJTAPI provider which is controlling the agent extensions which is relevant to the agent login process.

Michael Green
Cisco Employee
Cisco Employee

So basically all you did was shutdown/disconnect the UCCX Primary as part of the test? Can you confirm what action you did to trigger the test?

As part of the login process, the CCX Engine checks in it's memory if it has control of the extension you are trying to login with, if this check fails then from memory this is reported back to CAD and this is the kind of error which could be observed. When the UCCX fails over from Primary->Secondary the RMCM provider should be taking control of the configured agent extensions. If you think everything is configured correctly it might be worth checking into the MIVR logs on the Secondary node to see if there are any interesting errors when the login fails. It may also be helpful to enable the JTAPI logs and capture them from before the failing over occurs, but sometimes the issue can stem from the CTI on the UCM as well, so in summary it can be helpful to collect a full set of logs (UCM - CM/CTI and UCCX - MIVR/JTAPI) for a complete analysis.

You could try cleanly restarting the Secondary and repeating the test by taking out the Primary just to see if it was a transient issue or repeatable.

I know this may not be that helpful, but it's probably going to require some digging into logs to get to the bottom of this unless you can uncover any config errors with things other people have already suggested.

Hi Michael,

Thanks for taking the time to reply.

In answer to your questions - yes the test was purely taking the UCCX publisher offline.

This was done by disconnecting it from the network (in this case, editing the settings of the VM in vSphere and un-ticking the "connected" box for the network options).

As part of the DR testing we did, I didn't end up restarting the UCCX sub (or the UCCX pub) for that matter - after running the client configuration tool.

I had restarted the boxes a few times prior to this DR test though (on the same weekend).

PS. The issue was repeated over several tries (at least 6 times).

Will see if I can grab any useful logs out of it - but might have to do as you suggest and enable the additional logging when we can next get a test window. Unfortunately we had some unrelated issues which ended up consuming my time and left me little to do the UCCX testing and troubleshooting.

Cheers,

Brett

 

Not a problem.

It might be worth trying again and getting the full capture of logs (UCCX and UCM), otherwise sometimes time can be wasted if you spend cycles analyzing UCCX logs then need to repeat the problem again to capture UCM logs etc.

Ideally, give the Secondary a clean start before doing the test, then capture the logs from the restart (make sure you allow a reasonable amount of time for it to restart and all the services to settle in). If you're able to share the MIVR/JTAPI logs from the problem I can try and take a look if I have the time or others might be kind enough to jump in as well :-)

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: