06-02-2009 08:36 AM - edited 03-15-2019 05:08 AM
we have been dealing with this issue for a month and TAC still hasnt been able to pin point it. we upgraded from 4.x to 6.x. only thing that changed was that we also upgraded IOS on all MGCP voice gateways to 12.4(21)a. sporadically when on a call users will get a temp fail message on their phone and the call will only drop if the user is an UCCX agent. if it is a regular user their call will not drop. we have downgraded their IOS and nothing. CUCM traces are not giving any info. we captured a sniffer (see attached)
06-02-2009 11:12 AM
From the trace logs, there is an error:
Error:
SDL Logs:
2009/06/01 10:36:28.481| 002| AlarmErr | | | | | | AlarmClass: CallManager, AlarmName: SDLLinkOOS, AlarmSeverity: Error AlarmMessage: , AlarmDescription: SDL link to remote application out of service., AlarmParameters: LocalNodeId:2, LocalApplicationID:100, RemoteIPAddress:172.16.2.8, RemoteNodeID:3, RemoteApplicationID:100, LinkID:2:100:3:100, AppID:Cisco CallManager, ClusterID:StandAloneCluster, NodeID:CORP-SUB1,
Let's break it down:
AlarmName: SDLLinkOOS (SDL Link out of service
RemoteIPAddress:172.16.2.8 (this might be Callmanager????)
LocalApplicationID:100 (this is the CTI Manager on the local node)
RemoteApplicationID:100 (this is the CallManager service on the remote node)
This link will show you some of the output and recommendations:
http://partnerwiki.cisco.com/ViewWiki/index.php/CallManager_Event_Logs
Error Message: %CCM_CALLMANAGER-CALLMANAGER-3-SDLLinkOOS: SDL link to the remote application is out of service. Remote IP address of remote application [String], Unique Link ID. [String], Local node ID [UInt], Local Application ID. [Enum], RemoteNodeID [UInt], Remote application ID.[Enum]
Explanation-This alarm indicates that the local Cisco CallManager has lost communication with the remote Cisco CallManager. This alarm usually indicates network errors or a nonrunning remote Cisco CallManager.
Recommended Action-Investigate why the remote Cisco CallManager does not run or whether a network problem exists.
The other section of this error worries me:
ClusterID:StandAloneCluster, NodeID:CORP-SUB1
This subscriber is in its own cluster???
Maybe something got changed from the upgrade from 4.x to 6.x with your cluster groups. Perhaps, CORP-SUB1 is orphaned outside the cluster for some reason. Might be worth looking at.
06-02-2009 11:24 AM
all 3 servers are part of the same CM group. is there anything else i should be verifying? what should the clusterID say?
06-02-2009 12:43 PM
I believe it should say the name of the CM Group, since that is how you cluster them.
It might be a good idea if you see that error on any of the other servers' logs.
What device is 172.16.2.8?
06-02-2009 12:45 PM
that is Sub2 which is located away from the other 2 which is at the colo.
06-02-2009 12:49 PM
Okay that makes sense. Sub2 is not on the same LAN? Is it located over a WAN circuit then?
Sounds to me like it could be a networking issue then.
From SUB1, perform an extended ping with large payload (1500 bytes) and see if you get drops.
06-02-2009 12:46 PM
06-02-2009 12:48 PM
yes i saw the above link. but just isnt helping. thanks for your reply
06-02-2009 12:51 PM
It sounds like it is definitely a networking issue, because SUB1 is unable to connect using its CTI manager to the Callmanager service on SUB2.
06-03-2009 06:37 PM
We had the same issue.
Temp fail simultaneous with the SDL Link OOS. we also have a multisite cluster
SDL link is used for inter cluster signalling.
We had issues on the link between the two data centers (high CPU and no Q0S, after we resolved these, the SDL link OOS alarms disappeared, and so did the temp fail). I would focus my attention on that,
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: