Interested to hear any views on the following.
We are currently testing the CVP product with Cisco with a view to deploying in our contact centre. Cisco have designed geographically seperate CVP servers, CallManagers and IPCC hardware which according to their design documentation will provide call survivability in the event of a failure.
However when we test the scenario of bringing down one CVP server, what we find is that 4 minutes later all active calls in the system that were initially serviced by this CVP server are ceased.
Cisco are now telling us that this is how the system should actually work, to paraphrase they are saying that:
If a CVP server fails then the existing connection from the caller to the agent will be dropped after a 4 minute timeout. They say it is possible to invoke survivability but all this means is that the caller leg of the call is kept active, while the agents side is dropped and the caller routed back to the ICM script. I.E. back into IVR and potentially holding in a queue for an available agent, who is unlikely to be the agent they were originally dealing with.
They have offerred us this solution or the choice of making a business decision to wrap up all calls within 4 minutes of a CVP failure. They say that version 8 of CVP will rectify this.
Wrapping up all calls within 4 minutes of a CVP failure cannot be a viable option for many customers, imagine it in an emergency services centre
"the ambulance will be with you in five minutes Sir, please hang on the line and I'll talk you through what to do until then ...... oops, our CVP server has just failed so your on your own for the last 60 seconds before the ambulance arrives."
Even the survivability option of routing the poor guy back into the start of the emergency services script "emergency, which service?" is a non starter.
The whole scenario just falls down at every level
I can't find anything in Cisco web site with regard this 4 minute timeout, indeed all documentation states nothing other than calls in progress to agents will be maintained in the event of a CVP failure.
Any comments or shared experience appreciated
Thanks in advance
I've never tried to see what happens to calls with an agent when the Call Server is killed. I'll give it a shot.
In a fault-tolerant CVP design there are a number of Call Servers which have enough capacity to handle the failure of 1 (N+1 redundancy) or more (N+N redundancy) so new calls after a Call Server fails are OK. You obviously know this and are not concerned by new calls. You are concerned about the existing calls.
The Call Server comprises three parts - the SIP service, the IVR service, and the ICM service. From the SRND (page 4-2):
"SIP Service - Responsible for processing incoming and outgoing calls via SIP.
ICM Service - Responsible for the interface to ICM. The ICM Service communicates with the VRU PG using GED-125 to provide ICM with IVR control. The ICM Service was part of the Application Server in previous releases of Unified CVP, but now it is a separate component.
IVR Service - Responsible for the conversion of Unified CVP Microapplications to VoiceXML pages, and vice versa. The IVR Service was known as the Application Server in previous Unified CVP versions."
The SRND discusses what happens to calls in progress when each of these parts (in turn) stops. In all cases it says things like:
"If the Unified CVP SIP Service fails after the caller has been transferred (transfers include transfer to an IP phone, VoiceXML gateway, or other egress gateway), then the call continues normally until a subsequent transfer activity (if applicable) is required from the Unified CVP SIP Service."
so the 4 minute thing is news to me.
Any input appreciated, We're using h.323 instead of SIP, but that was the first thing I asked and was told that SIP will give the same results.
No doubt Geoff will figure this out, but according the SRND a call server crash is survivable for current calls, under certain configurations of course!
From the 7.0 SRND
"Configuring High Availability for Calls in Progress
In the event that a Unified CVP Call Server fails with calls in progress, it is possible to salvage all calls if certain gateway configuration steps have been taken. A Call Server can fail in one of several ways:
â¢The server can crash.
â¢The process can crash.
â¢The process can hang.
â¢There can be a network outage.
The configuration discussed in this section protects against all of these situations. However, the following two situations cannot be protected against:
â¢Someone stops the process with calls in progress. This situation occurs when a system administrator forgets to put the Call Server out-of-service first to allow calls in progress to finish before stopping the process.
â¢The Call Server exceeds the recommended call rate. Although there is a throttle for the absolute number of calls allowed in the Call Server, there is no throttle for call rate. In general, exceeding 5 calls per second (cps) for an extended period of time can cause erratic and unpredictable call behavior on certain components of the CVP solution if one of the components is not sized correctly or if the call load is not balanced according to the weight and sizing of each call processing component.
For call survivability, configure the originating gateways as described in the latest version of the Configuration and Administration Guide for Cisco Unified Customer Voice Portal (CVP), available at
The survivability.tcl script itself also contains some directions and useful information.
In the event of most downstream failures (including a Unified CVP Call Server failure), the call is default-routed by the originating gateway. Note that survivability is not applicable in the Unified CVP Standalone and NIC-routing models because there is no Unified CVP H.323 or SIP Service involved anywhere in those models.
There is also a mechanism for detection of calls that have been cleared without Unified CVP's knowledge:
â¢Unified CVP checks every 2 minutes for inbound calls that have a duration older than a configured time (the default is 10 minutes).
â¢For those calls, Unified CVP sends an UPDATE message. If the message receives a rejection or is undeliverable, then the call is cleared and the license released.
The CVP SIP Service can also add the Session Expires header on calls so that endpoints such as the originating gateway may perform session refreshing on their own. RFC 4028 (Session Timers in the Session Initiation Protocol) has more details on the usage of Session Expires with SIP calls."
>However when we test the scenario of bringing down one CVP server, what we find is that 4 minutes later all active calls in the system that were initially serviced by this CVP server are ceased.
I'm currently testing a CVP 7.0(2) with 2 x CVP Servers (Call Server / VXML Server / Media Server combo boxes), with ICM 7.5.6 (geographically seperated), CUCM 7.1.2a (5 node cluster), CUPS (SIP Proxy) 7.0.5 (2 node cluster), and multiple 3845s with IOS 12.4(15)T10.
I've just tested this scenario by setting up a test call from the PSTN into my setup, then checking CVP Ops Console to determine which Call Server is being used.
I've then unplugged the network cable on the call server that my test call passed through.
I'm currently sitting on 20 minutes post cable unplug, and the voice part of the call is still up.
Having said that, I did lose CTI call control from CAD - I received a "Request Operation Failed" popup window, and while CAD shows Talking, there are no active calls in the call window.
The client-side debug shows a "The call has failed due to a network interruption" message.
The hard phone is also showing a "Temp Fail" error on the screen - much like SRST mode - although the voice call remained active.
At the 20 minute mark I dropped the call at both ends (the IP Phone did not recognise the customer side of the disconnect), and both CAD and the hardphone returned to normal behaviour, and the next call arrived without incident.
I put all of this down to the failure of one of the parties in the SIP Call.
Note: I would expect to see the call fail at around the 30 minute mark, when the default SIP MinSE timers kick in and clear it out.
What are the details of your system? How are you failing the Call Server (process kill, power cable unplug, network cable unplug)?
Thanks for the input. Intereseting that your calls remain open.
we're using CVP 7.0(2), ICM 7.5(6), CUCM 7.1.2.
We're simulating the failure by disconnecting the network cable.
Can't understand why I'm being told by Cisco that calls being dropped are normal behavior, when everything in their documentation states otherwise and indeed your tests confirm the opposite is true
>we're using CVP 7.0(2), ICM 7.5(6), CUCM 7.1.2.
What IOS are you using on the Ingress GWs? Is the result different when the call is from IP Phone to IP Phone?
>Can't understand why I'm being told by Cisco that calls being dropped are normal behavior
Agreed - that's a really odd thing to be told. Mind you, I have seen some very odd behaviour when a SIP Carrier is in use, as opposed to an ISDN Carrier.