- Cisco Employee,
Information Necessary to perform Root Cause Analysis on Unity Failover
The Information needed to troubleshoot why a Unity server running versions 4.x, 5.x, and 7.x failed over:
Mandatory: Should be Included when Opening TAC SR Desired: Follow up Information that should be collected once Mandatory Information has been collected.
Unity failover normally occurs for a few possible reasons:
- Call received on the secondary server
- The secondary server loses communication with the primary server (primary does not respond for 30 seconds)
- Failover is manually initiated using Failover Monitor
- Port Lockup
- SQL replication failures
- Slow MAPI interaction with exchange
Mandatory information to get:
- GUSI Cab files from both servers. This will tell us why it failed over.
- For full root cause analysis, the CCM and SDL Traces from all the nodes in the cluster are necessary.(if Unity is integrated with CM). Traces should span at least 30 minutes prior to the failure. Refer to this link for CM trace collection -
For a more detailed root cause analysis the following traces will be needed:
- Unity Diagnostic Tool Traces: AvCsMgr, AvCsNodeMgr, svchost from BOTH servers
The following traces need to be configured in advance, otherwise the traces may not contain sufficient information.
Configuring the traces
On the Unity server, go into Cisco Unity Tools Depot.
- Expand the section "Diagnostic Tools"
- Double Click on "Unity Diagnostic Tool" This will open the Unity
- In the right pane, click on "Reset to Default Traces". This will launch a wizard.
- Check the box beside "Reset to Default Traces" and then click Finish.
You will return to the Unity Diagnostic Tool
- Click on "Configure Macro Traces" This will launch a wizard.
- Click Next. This will take you to the "Configure Macro Traces" screen where all of the components are listed.
- Call Flow Diagnostics
- Conversation State Traces
- Call Control (Miu) Traces
Click on Next, Finish.
Click on "Configure Micro Traces" This will launch a wizard.
- Click Next. This will take you to the "Configure Micro Traces" screen where all of the components are listed.
- Do not uncheck any traces that are already selected
Place checkmarks beside the following:
In the Micro page:
- CDE - all
- Conversations - all
- Doh - all
- Malex - all
- MiuCall - 10,11
- MiuGeneral - 12, 13, 14, 16
- MiuMethods 13 through 15
- NodeMgr – 10-18
- Skinny - all but keep alive
- Click on Next, Finish.
- Close Unity Diagnostic Tool.
Once the issue recurs you will need to retrieve the trace files. The first thing is to make a note of the date/time that the error occurred.
== Retrieving the Traces
- On the Unity server, go into Cisco Unity Tools Depot.
- Expand the section "Diagnostic Tools"
- Double Click on "Unity Diagnostic Tool" This will open the Unity Diagnostic Tool.
- Click on "Gather Log Files". This will launch a wizard.
- Choose "Select Logs"
- Click on "Browse" and specify a location that will be easy to find.
- Click Next. This will take you to the "Select Logs to Gather" screen.
- Check/select the following log files that will contain the time of the errors. The names of the files contain the date/time of the first timestamp in the log.
- Click Next. The tool will then place the log files into the folder you specified.
- Retrieve the files from the folder and zip them to the service request.
Look for certain defects that could trigger failovers -
CSCsc62081 - CCM recieve SDL OOS and trigger the failover - CM issue.
CSCsi50517 - Failover: SQL replication jobs can fail and provide no warning - Unity SQL.
CSCsc62073 - Locations Out of Bandwidth causes unexpected Unity Failover - CM issue
CSCsb23638 - Unity port may ring-no-answer after not receiving StartMediaTransmit - TSP issue.
CSCse00439 - disconnect right before supervised transfer may lead to failover - TSP issue.
CSCsi65508 - Unity TSP port failback detection fails - CM/TSP.
CSCsh35344 - failed xfer initiate may cause delay in clearing port - TSP issue.
CSCse43664 - Supervised xfer cleanup may result in delay answering next call - TSP issue.
CSCsj13401 - Unity failover results from OpenSSL errors - TSP issue.
CSCsc91972 - Poor Exchange performance can cause Unity to stop answering callsPoor Exchange performance can cause Unity to stop answering calls.
Failover events reference -