I am preparing a group of switches for production and noticed some interesting behavior. I am interested in getting some feedback on it.
Allright, This is the enviornment. I have two 3750G switches stacked via the stacking cable. I have two 2960 switches connected to those via redundant connections as etherchannels. Here is a diagram:
So, I have it connected and I am concurrently pinging everything. Each PC is pinging:
- The other PC
- It's own Access Switch
- The other Access Switch
- The core switch stack
These are all continous pings and the all is well. I also checked the device manager to verify that the traffic is being spread across both links in the etherchannel. However, If I pull power from one of the 3750 Switches, it takes approx 30 seconds for the traffic to be successful. (the ping to it's own access switch doesn't miss a beat). Once all the pings start coming back, I plug the switch back in. Once it (the switch that went down) fully boots, I drop a few pings to the stack and everything is fine.
I have tested it several times and there appears to be a few differnet variations of the results. Sometimes other devices come back faster than others. Seems like if I kill the master it takes a harder hit than if I kill the member. But, generally speaking I can't expect a full recovery for 30 seconds.
So, my questions are...
is this normal?
Should I expect faster recovery via a stack like this?
Is there anyway to improve this?
I did not configure persistent mac address in the stack configuration. Could that impact this?
If it's not indicative of a problem or misconfiguration, I am inclined to let it be. I think a 30 second automated recovery is within the scope of the availability expectations. However, if it means that I might have done it wrong, I'd like to get it corrected.
if the stack is the root bridge for the vlan take in account the following notes about STP and stacks
If the stack master fails or leaves the stack, the stack members elect a new stack master, and all stack members change their bridge IDs of the spanning trees to the new master bridge ID.
•If the switch stack is the spanning-tree root and the stack master fails or leaves the stack, the stack members elect a new stack master, and a spanning-tree reconvergence occurs
So the fact that things are worse when you switch off the stack master is explained by STP activity.
30 seconds out of service is compatible with backbone fast.
you can check spanning-tree using commands like
show spanning-tree summary
show spanning-tree vlan vlan#
You should see a change in bridge-id after failure of stack master.
I wonder if Mac address persistency could provide a better result by hiding this change of master.
Hope to help