cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
877
Views
0
Helpful
13
Replies

IPCC failover

andy.butler
Level 1
Level 1

Hi, I'm seeing a strange problem. I have 2 ipcc servers. 1 primary and one secondary. server a is the primary but it keeps failing over to server B becoming the primary with no apparent reason. we have to reboot server B to force server A to become the Primary.

I'm no expert on IPCC so if anybody can give me any idea's that would be great.

13 Replies 13

Aaron Harrison
VIP Alumni
VIP Alumni

Hi

You'll have to go through the logs in c:\program files\wfavvid\logs to see the cause of the failover.

Search the logs for instances of the word 'exception' around the time of the failover, some will be cryptic but some will clue you in to the cause.

Regards

Aaron

Please rate helpful posts...

Aaron Please remember to rate helpful posts to identify useful responses, and mark 'Answered' if appropriate!

Gergely Szabo
VIP Alumni
VIP Alumni

We had a similar problem with one of our installations. Actually, it was due to failing DNS lookups (between the two IPCCX nodes).

Double check whether Netbios and DNS lookups OK (can you ping server1 from server2 by 1. hostname 2. fqdn?)

I am encountering the same issue with a customer. There is no reason for the failover but if you look at the logs the Master and Standby talk and figure out who is going to be the master and for which services.

In my case no all of the services failover, just the majority. It is never the same but the majority of the time it's all but 2-3 of the SQL services.

When TAC was contacted we searched through the logs and found that it was a "network error." So we monitored the switchports and when the next failover occurred and found that the network connections were fine. We believe that the "network error" is a general error and it due to the server being to taxed to read the information off the wire.

Right now we are running 2 MCS-7825-H1s with 2GB of RAM so we are going to max them out to 4GB. I will let you know how this worksout for us. It might be a few weeks before we find out if it's going to help or not.

Hope this helps.

Travis

HI,

Make sure that the speed and duplex are matching on Server side and switch side. It should be either Auto on both ends or hard coded to 100Full on both ends.

All the best.

Regards,

Venkat

In my experience TAC are quick to diagnose a 'network error'... IPCCX is susceptible to transient network failures that might go unnoticed with other applications, but you would be advised to look at the logs yourself and make a judgement.

Aaron

Aaron Please remember to rate helpful posts to identify useful responses, and mark 'Answered' if appropriate!

Along with the speed and duplex make sure the bindings are in the correct order. Also make sure all server entries are in the host file of each server.

thanks for the info guys. i'm not an IPCC guy and here is an event i think might be related. any more ideas that will be great.

Event Type: Warning

Event Source: Cisco AVVID Alarm Service

Event Category: None

Event ID: 82

Date: 26/03/2007

Time: 13:14:18

User: N/A

Computer: GBLPLIPCC01

Description:

The description for Event ID ( 82 ) in Source ( Cisco AVVID Alarm Service ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: 82: Mar 26 02:14:17.882 BST: %MCVD-GENERIC-1-ModuleStop: Module has stopped; Module Name=CRS Engine; Module Failure Cause=Other or Unspecified Failure.

When you say "network error", are you referring to the following message in Event Viewer?

"the server has encountered a network error"

I can't say for sure if the pagefile fixed our problem but we have not had an issue since I upped it to 4GB. Again Cisco told me to make the pagefile 200% of your physical memory. As of now we have not had a failover since we changed the size of the pagefile.

Give this a shot and see how it works for you, it can't hurt upping it anyway. Hope this helps.

Travis

Just set this, will monitor the logs and post if I see anything change.

Well it looks like my problem is back. After initially setting the page file to 200% of the physical memory we didn't have any issues for 2 1/2 months. All of the sudden the primary failed over twice in the same week. I opened a TAC case and we see where it loses heartbeats but we do not know why.

I'll keep you posted.

Thanks for the note. Just FYI, upping my paging size didn't seem to do anything.

Travis,

Any word from Cisco TAC on this? I am getting the same issue with one of my customer. Please post your comments if TAC was able to provide you a solution.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: