we have been dealing with this issue for a month and TAC still hasnt been able to pin point it. we upgraded from 4.x to 6.x. only thing that changed was that we also upgraded IOS on all MGCP voice gateways to 12.4(21)a. sporadically when on a call users will get a temp fail message on their phone and the call will only drop if the user is an UCCX agent. if it is a regular user their call will not drop. we have downgraded their IOS and nothing. CUCM traces are not giving any info. we captured a sniffer (see attached)
From the trace logs, there is an error:
2009/06/01 10:36:28.481| 002| AlarmErr | | | | | | AlarmClass: CallManager, AlarmName: SDLLinkOOS, AlarmSeverity: Error AlarmMessage: , AlarmDescription: SDL link to remote application out of service., AlarmParameters: LocalNodeId:2, LocalApplicationID:100, RemoteIPAddress:172.16.2.8, RemoteNodeID:3, RemoteApplicationID:100, LinkID:2:100:3:100, AppID:Cisco CallManager, ClusterID:StandAloneCluster, NodeID:CORP-SUB1,
Let's break it down:
AlarmName: SDLLinkOOS (SDL Link out of service
RemoteIPAddress:172.16.2.8 (this might be Callmanager????)
LocalApplicationID:100 (this is the CTI Manager on the local node)
RemoteApplicationID:100 (this is the CallManager service on the remote node)
This link will show you some of the output and recommendations:
Error Message: %CCM_CALLMANAGER-CALLMANAGER-3-SDLLinkOOS: SDL link to the remote application is out of service. Remote IP address of remote application [String], Unique Link ID. [String], Local node ID [UInt], Local Application ID. [Enum], RemoteNodeID [UInt], Remote application ID.[Enum]
Explanation-This alarm indicates that the local Cisco CallManager has lost communication with the remote Cisco CallManager. This alarm usually indicates network errors or a nonrunning remote Cisco CallManager.
Recommended Action-Investigate why the remote Cisco CallManager does not run or whether a network problem exists.
The other section of this error worries me:
This subscriber is in its own cluster???
Maybe something got changed from the upgrade from 4.x to 6.x with your cluster groups. Perhaps, CORP-SUB1 is orphaned outside the cluster for some reason. Might be worth looking at.
I believe it should say the name of the CM Group, since that is how you cluster them.
It might be a good idea if you see that error on any of the other servers' logs.
What device is 172.16.2.8?
Okay that makes sense. Sub2 is not on the same LAN? Is it located over a WAN circuit then?
Sounds to me like it could be a networking issue then.
From SUB1, perform an extended ping with large payload (1500 bytes) and see if you get drops.
It sounds like it is definitely a networking issue, because SUB1 is unable to connect using its CTI manager to the Callmanager service on SUB2.
We had the same issue.
Temp fail simultaneous with the SDL Link OOS. we also have a multisite cluster
SDL link is used for inter cluster signalling.
We had issues on the link between the two data centers (high CPU and no Q0S, after we resolved these, the SDL link OOS alarms disappeared, and so did the temp fail). I would focus my attention on that,