IPCC failover

Unanswered Question
Apr 13th, 2007

Hi, I'm seeing a strange problem. I have 2 ipcc servers. 1 primary and one secondary. server a is the primary but it keeps failing over to server B becoming the primary with no apparent reason. we have to reboot server B to force server A to become the Primary.

I'm no expert on IPCC so if anybody can give me any idea's that would be great.

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Aaron Harrison Fri, 04/13/2007 - 01:17


You'll have to go through the logs in c:\program files\wfavvid\logs to see the cause of the failover.

Search the logs for instances of the word 'exception' around the time of the failover, some will be cryptic but some will clue you in to the cause.



Please rate helpful posts...

Gergely Szabo Fri, 04/13/2007 - 08:44

We had a similar problem with one of our installations. Actually, it was due to failing DNS lookups (between the two IPCCX nodes).

Double check whether Netbios and DNS lookups OK (can you ping server1 from server2 by 1. hostname 2. fqdn?)

Travis Cassell Mon, 04/16/2007 - 12:07

I am encountering the same issue with a customer. There is no reason for the failover but if you look at the logs the Master and Standby talk and figure out who is going to be the master and for which services.

In my case no all of the services failover, just the majority. It is never the same but the majority of the time it's all but 2-3 of the SQL services.

When TAC was contacted we searched through the logs and found that it was a "network error." So we monitored the switchports and when the next failover occurred and found that the network connections were fine. We believe that the "network error" is a general error and it due to the server being to taxed to read the information off the wire.

Right now we are running 2 MCS-7825-H1s with 2GB of RAM so we are going to max them out to 4GB. I will let you know how this worksout for us. It might be a few weeks before we find out if it's going to help or not.

Hope this helps.


venkat.kt Mon, 04/16/2007 - 20:27


Make sure that the speed and duplex are matching on Server side and switch side. It should be either Auto on both ends or hard coded to 100Full on both ends.

All the best.



Aaron Harrison Mon, 04/16/2007 - 22:16

In my experience TAC are quick to diagnose a 'network error'... IPCCX is susceptible to transient network failures that might go unnoticed with other applications, but you would be advised to look at the logs yourself and make a judgement.


Travis Cassell Tue, 04/17/2007 - 06:58

Along with the speed and duplex make sure the bindings are in the correct order. Also make sure all server entries are in the host file of each server.

andy.butler Wed, 04/18/2007 - 00:31

thanks for the info guys. i'm not an IPCC guy and here is an event i think might be related. any more ideas that will be great.

Event Type: Warning

Event Source: Cisco AVVID Alarm Service

Event Category: None

Event ID: 82

Date: 26/03/2007

Time: 13:14:18

User: N/A

Computer: GBLPLIPCC01


The description for Event ID ( 82 ) in Source ( Cisco AVVID Alarm Service ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: 82: Mar 26 02:14:17.882 BST: %MCVD-GENERIC-1-ModuleStop: Module has stopped; Module Name=CRS Engine; Module Failure Cause=Other or Unspecified Failure.

johnnylingo Wed, 05/09/2007 - 20:48

When you say "network error", are you referring to the following message in Event Viewer?

"the server has encountered a network error"

Travis Cassell Thu, 05/10/2007 - 05:08

I can't say for sure if the pagefile fixed our problem but we have not had an issue since I upped it to 4GB. Again Cisco told me to make the pagefile 200% of your physical memory. As of now we have not had a failover since we changed the size of the pagefile.

Give this a shot and see how it works for you, it can't hurt upping it anyway. Hope this helps.


johnnylingo Mon, 06/04/2007 - 10:27

Just set this, will monitor the logs and post if I see anything change.

Travis Cassell Tue, 06/19/2007 - 11:37

Well it looks like my problem is back. After initially setting the page file to 200% of the physical memory we didn't have any issues for 2 1/2 months. All of the sudden the primary failed over twice in the same week. I opened a TAC case and we see where it loses heartbeats but we do not know why.

I'll keep you posted.

johnnylingo Tue, 06/19/2007 - 12:50

Thanks for the note. Just FYI, upping my paging size didn't seem to do anything.

ajohn1976 Mon, 07/07/2008 - 09:36


Any word from Cisco TAC on this? I am getting the same issue with one of my customer. Please post your comments if TAC was able to provide you a solution.


This Discussion