3130 Stack Master failover...

djbrightman · ‎07-29-2009

Hi

I have a couple of 3130 switches in the back of a Dell blade M1000 chassis that has been stacked, so that I can provide lacp channel groups to the servers within the blade centre.

The blade servers are SLES10, running Novell Cluster Services.

The bonds are set up with 802.3ad, and work fine with the channel groups I have configured.

However, during failover testing I have discovered an issue...

The failure (simulated with powercycle) of the member switch works fine, but the failure of the stack master causes a prolonged network outage, that causes the cluster nodes to miss heartbeats and fail...

Looking at the bond (cat /proc/net/bonding/bond0) during the failover I can see that the lacp status of both slaves temp. go down during the master failure, but the 'member' connection gets back up very quickly i.e. the channel appears to be OK. The problem is that it takes well over 30 seconds for the bond to be able to communicate (tested with both out an in bound pings)

Is there any way to speed up the member->master process, or speed up the process of the new 'master' enabling communication on it's ports?

I can provide any config details required - please let me know what you need!

Thanks in advance

David

hadbou · ‎08-04-2009

When a cross-stack LACP EtherChannel has a maximum configuration, such as eight active and eight hot-standby ports, and there are multiple rapid sequential master failovers and stack rejoins that cause extreme stress, it is possible that the port channel will not function as expected. Some ports might not join the EtherChannel, and traffic might be lost. You can detect the condition by using the remote command all show etherchannel summary privileged EXEC command.

Reza Sharifi · ‎08-04-2009

Hi David,

We use IBM Blade centers with 3110x switches. Each chassis contains 4 3110 switches and each switch has 1 10gig uplink. We are only uplinking 2 out 4 switches and have the set up as you do by using EtherChannel and LACP to the server blades. The only way I have tested this is by pulling one of the 2 fiber links and it fails very fast. How do you simulate power failure to one switch within a chassis, do Dell switches have different power switch for each switch?

The IBM chassis does not even have power switch for entire chassis, when we pull the power cable every thing goes down. I will look into to see if I can simulate the same thing and let you know

Thanks,

Reza