cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
515
Views
5
Helpful
4
Replies

CSS Bounces on a Server /Box Down issue

d-lising1
Level 1
Level 1

Hi,

We deployed a redundant CSS11501 with the Server's directly connected. There is no specific Master between the two.

When a server goes down (the box itself w/2 NIC), CSS-A detects a link failure and will fail-over to CSS-B. A minute after CSS-B becomes a master, it will also detect a link failure (secondary NIC of the Box). It will then switch back to CSS-A causing a loop until the Server is fixed. As a temporary work around, we shut the interfaces connecting to the Down Server on both CSS so it won't monitor the failed link.

This requires human intervention that our customer doesn't want.

We are using the 8.10 Standard Services on both CSS11501.

thanks and regards,

Dennis

4 Replies 4

smahbub
Level 6
Level 6

send us the messages which is logged during this time so that we can analyze exactly what is triggering the switchover

Hi,

I no longer have the logs. I guess High Availability is not recommended for CSS.

The Redundant CSS function is only restricted when a CSS Box goes down. This is the reason why designs are configured to have an L2 switch between CSS and the Server Farm instead of directly connected.

I hope their will be a revision on the IOS that "ip redundancy master" and redundancy-phy can be used at the same time.

For now we reconfigure CSS but requires manual intervention.

CSS-A is manually forced a s master.

Then we take out redundancy-phy on the interfaces connecting to Serverfarm as well as redundancy on the Server VLAN in CSS-B.

When a server failure occurs, it switchover to CSS-B and it didn't cause any loop/flap.

When Server is fixed, then manual intervention occurs by forcing back CSS-A to be the master.

Thanks to all that responded to my post.

cheers,

Dennis

thumpercisco
Level 1
Level 1

I have same basic setup and use CSS11503 with a server with 3 instances and 2NICS. The server is setup with internal server software for NIC failover by applying a internal server failover VIP that both NICS use. The server controls which NIC is active. The CSS is setup to monitor the VIP address of the server VIP internal failover, this way it does not monitor the IP address of the individual NICS. This has nothing to do with the VIP address in the CSS for content and group only the services config.

If you use CSS in redundancy mode or active/active, to avoid the looping of virtual-router transitions, I would suggest the following.

1/ put a L2 switch at the server side, between CSS and Server NICs. Configure Etherchannel or trunk between the L2s.

2/ configure NIC teaming on the server and make sure they host a virtual ip address and that let the CSS talk to that virtual ip address and its ports, as thumpercisco was saying.

3/ Do not let the CSS to monitor its own physical ports to trigger a failover event, suggest to use critical services to monitor upstream(FW or router ip) plus your downstream (mngt ip on L2) ip addresses. You will be safe now as you can not afford to trigger a failover event on the CSS when one single server NIC fails or someone pulled out a cable etc(assuming no L2 being used).

4/ also use reporter services to monitor you vrrid peers, let it also trigger a failover event.

To summarize, if you do not have a server side L2 switch, do not monitor the physical ports as a false failover event will occur.

thanks

Suresh Kumar

Installed the 201st CSS last week and still going!

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: