I have a client who is getting random disconnect errors from his cucm servers. They seem to encompass other network errors at times as well, but not always. I'm having a hard time telling if its cucm or if it is a network issue in general.
I pulled his core logs and at the time of disconnect messages from RTMT I'm seeing this:
and lots of them, for a few minutes and then it stops. When I search around for solutions for this I see "contact TAC" repeatedly. Can anyone help me determine if I have a cm problem or a network problem. He did reboot the cluster 12 days ago and it didn't stop the problems. The disconnect errors related to all components of the phone system (cucm, cuc, uccx, cups) and have been anything from "SDL link to remote application is out of service", "OUT-OF-SERVICE AppID : Cisco UP Presence Engine ClusterID", "
The Cisco UP Presence Engine service on the peer node of a subcluster has failed AppID : Cisco Syslog Agent ClusterID", "user 2 ntpRunningStatus.sh: Primary node NTP server, SVUCCX01, is currently inaccessible or down"
So its just like all communication breaks and then comes back again. I'd like to do as much leanring as possilble with this and not just run to TAC. Any suggestions?
This needs to be tackled by concentrating on one specific error at a time. Corresponding to the time stamp of the error it needs to be checked if there is any impact on the performance of the server / device mentioned in the error. Corresponding traces ( detailed cucm or others depending on error ) need to be collected covering a duration of a few minutes prior to the error and leading upto the error. If you have all the specifc details for one event as described above please post the same to be looked at.
Couple of important links:
Set Up Cisco CallManager Traces for Cisco Technical Support
Please see the explaination for the error you are getting:-
CCM_CALLMANAGER-CALLMANAGER-1-SDLLinkOOS : SDL link to remote application is out of service Remote Application IP Address [String] Unique Link ID [String] Local Node ID [UInt] Local Application ID [Enum]Remote Node ID [UInt] Remote Application ID [Enum]
Explanation This alarm indicates that the local Unified CM has lost communication with the remote Unified CM. This alarm usually indicates that a node has gone out of service (whether intentionally for maintenance or to install a new load for example; or unintentionally due to a service failure or connectivity failure).
Recommended Action In the Cisco Unified Reporting tool, run a CM Cluster Overview report and check to see if all servers can communicate with the Publisher. Also check for any alarms that might have indicated a CallManager failure and take appropriate action for the indicated failure. If the node was taken out of service intentionally, bring the node back into service.
Reason Code - Enum Definitions
Enum Definitions - LocalApplicationID
Enum Definitions - RemoteApplicationID
The most common Lost communication for CallManagers could be Callmanager server hang, network problems or high CPU
You can use RTMT to monitor this.
CM servers keep a TCP connection to other servers in the cluster. When that TCP connection is broken due to network connectivity or lack of server resources, the above error is generated.
Also the sdi traces you have provided won't be useful to analyze the problem but rather please take detailed sdl traces.
If possible then take SDL detailed traces with sniffer filtering TCP port 8002 in order to find out if this is a Network problem or not.
Sometimes due to highly fragmented disk can cause heavy disk I/O utilizing all the CPU.
Was there any upgradation or migration activity was carried out in network?
Please check the server NIC and switch NIC or other NIC have the same speed/duplex settings.
The short answer is that you don't.... That isn't entirely true while at
the same time it kind of is, but for the most part you don't configure
the softkeys. You enable or disable them via TCL. Here is the long
answer. Be sure to read the whole thing or e...
Topology: IP Phone > Switches > Microsoft NPS setup to forward 802.1x
proxy to > ISE 2.1 patch 3 Authentication: EAP-TLS using Cisco MIC SANs
Phone Models 802.1X support? 802.1x flavor Addtl Comment EAP-MD5 EAP-TLS
Cisco 3905 Y Y N Cisco 6911 Y Y N Cisco ...
This document describe how DST changes and how time changes are
implemented in DST. Daylight Saving Time (DST) is the practice of
setting the clocks forward 1 hour from standard time during the summer
months, and back again in the fall, in order to make b...