Cisco Support Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Community Member

virtual-router problem on bacup CSS

All virtual routers on our backup CSS are flipping back-and-forth from backup to master within 1-to-5 seconds at different times of the day. Different services are also flipping back-and-forth from alive to down during the same time frames. The master CSS shows no symptoms and no users are complaining.

From our backup CSS log.sys file:

MAR 19 12:31:09 5/1 61114 VRRP-4: Virtual router 3: master on interface 172.20.2.93

MAR 19 12:31:10 5/1 61120 VRRP-4: Virtual router 3: backup on interface 172.20.2.93

MAR 19 12:31:10 5/1 61121 VRRP-4: Master is 172.20.2.92

MAR 19 12:31:10 5/1 61123 NETMAN-2: Enterprise:Service Transition:cwh-ott-nt-053-tourismpartners -> down

Our Sniffer on the frontend (spanning the backup CSS port) confirms that when the issues occur, the master vrrp multicast packets are appropriately reaching the backup CSS, and we also see that the backup CSS begins to send conflicting vrrp multicast packets (confirming that it thinks its the master router). The only other unusual event I can see during that time frame is that the backup CSS fails to reply to several pings from an upstream Foundry ServerIron load balancer (pings every 400ms). This leads me to believe that our backup CSS has internal problems (i.e.: it can't keep up with VRRP and PING packets it receives).

Any idea or suggestion on how to identify the root cause?

5 REPLIES
Cisco Employee

Re: virtual-router problem on bacup CSS

Anything else in the log ?

Like IMM Queue full error messages ?

software version ?

How many icmp keepalives ?

What platform ?

what is the cpu level ? (show system-resources)

Thanks,

Gilles.

Community Member

Re: virtual-router problem on bacup CSS

-There are two other suspicious log entries: services going up and down, wich I'll investigate next because they are all http:get keepalives (on several diffent circuits/VLANs) and also these "duplicate IP" errors:

MAR 21 17:06:53 5/1 152326 IPV4-4: Duplicate IP address detected: 204.104.133.106 00-10-58-03-49-05

MAR 21 17:06:53 5/1 152327 IPV4-4: Incoming CE 0x3c01f00, incoming (0 based) SLP 0xf

Maybe the CSS has a problem with http:get keepalives? I've increased the logging level for vrrp to "debug" but have not

-I don't know about IMM Queue full error messages because we had disabled this trap a few months ago due to too many of these errors filling up our log files. I've re-enabled that trap for now to see if they coincide with the vrrp problems. I haven't seen any new ones yet.

-We're running version 5.03 Build 15.

-we have approx. 80 services defined with icmp keepalives and approx 10 with http:get keepalives.

-Our platform is CSS11153.

-CPU level on backup CSS is between 10%-20%. on Master CSS, CPU is between 5%-15%. We're trying to setup historical reporting on the CPU utilization over time to see if it changes when the problems occur.

PS: the primary CSS still appears to be functionning properly.

Community Member

Re: virtual-router problem on bacup CSS

I have run into problems with TCP based keepalives failing with similar symptoms. In my case the return packets from the web server would either be lost or they would be forwarded out another interface (even different VLANS).

You could take a trace and see if traffic sent to the CSS is being ignored.

The problem was related to a software bug associated with the Gig port that cropped up when the attached ethernet switch was rebooted.

Shutting down and bringing the CSS gig port back up would clear the problem.

It has been fixed in release 5.0B69.

I

Cisco Employee

Re: virtual-router problem on bacup CSS

5.03(15) is definitely not a good idea.

I think you could be hitting this bug :

CSCdx55312 link failures causing problems in vip redundancy (active/backup)

I would suggest to try 5.03(33) and see if it makes any difference.

Gilles.

Community Member

Re: virtual-router problem on bacup CSS

Thanks David & Gilles.

I'll see if I can debug any phy errors on our locked 100/FULL interface configuration.

Gilles: 5.03(33) is not yet listed in the software center... Any ETA available?

198
Views
0
Helpful
5
Replies
CreatePlease to create content