I have a couple of 3130 switches in the back of a Dell blade M1000 chassis that has been stacked, so that I can provide lacp channel groups to the servers within the blade centre.
The blade servers are SLES10, running Novell Cluster Services.
The bonds are set up with 802.3ad, and work fine with the channel groups I have configured.
However, during failover testing I have discovered an issue...
The failure (simulated with powercycle) of the member switch works fine, but the failure of the stack master causes a prolonged network outage, that causes the cluster nodes to miss heartbeats and fail...
Looking at the bond (cat /proc/net/bonding/bond0) during the failover I can see that the lacp status of both slaves temp. go down during the master failure, but the 'member' connection gets back up very quickly i.e. the channel appears to be OK. The problem is that it takes well over 30 seconds for the bond to be able to communicate (tested with both out an in bound pings)
Is there any way to speed up the member->master process, or speed up the process of the new 'master' enabling communication on it's ports?
I can provide any config details required - please let me know what you need!
Thanks in advance