cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2531
Views
10
Helpful
14
Replies

3750x Stackwise Plus Master Election

Rob Pettigrew
Level 1
Level 1

I want to verify the functionality of the master election in the event of a master failure.  I have two 3750X switches in a stack.  If I lose or reboot the master will my secondary switch reboot itself to become the new master?  If so, I assume this will create ~2 minutes of downtime as the secondary switch reboots and becomes functional again.  Can anyone confirm?  Thanks.

Rob

Edit:  Also can anyone confirm that this functionality is different on the 3850 line of switches?  I believe there would be little to no downtime in a 2 switch 3850 stack with Active failure since the standby would just assume the role.

3 Accepted Solutions

Accepted Solutions

Reza Sharifi
Hall of Fame
Hall of Fame

If I lose or reboot the master will my secondary switch reboot itself to become the new master?

If you reboot the master, the secondary will take over right a way without any reboot. Another word, the second switch will not reboot itself.

Remember, if you want to reboot only the master, you need to use the slot number in your command:

reload slot 1

In this case, I am assuming your master switch is switch 1.

HTH

View solution in original post

Leo Laohoo
Hall of Fame
Hall of Fame
If I lose or reboot the master will my secondary switch reboot itself to become the new master?

This is correct.

If so, I assume this will create ~2 minutes of downtime as the secondary switch reboots and becomes functional again.

This is incorrect.  The "downtime" will entirely depend on how the clients are connected up to the redundant stack and how those links are configured.   The ARP table is shared among all the switches.

In our case, if I continuously ping the management IP address of the switch stack and power down the stack master, I would loose a few pings. 

Same theory as to the 3850.  But I am thinking the 3850, being a multi-core, would react alot faster compared to a single-core 3750X.

View solution in original post

Joseph W. Doherty
Hall of Fame
Hall of Fame

Disclaimer

The  Author of this posting offers the information contained within this  posting without consideration and with the reader's understanding that  there's no implied or expressed suitability or fitness for any purpose.  Information provided is for informational purposes only and should not  be construed as rendering professional advice of any kind. Usage of this  posting's information is solely at reader's own risk.

Liability Disclaimer

In  no event shall Author be liable for any damages whatsoever (including,  without limitation, damages for loss of use, data or profit) arising out  of the use or inability to use the posting's information even if Author  has been advised of the possibility of such damage.

Posting

As also noted by Leo, losing the stack master doesn't create much downtime (if any).  Another switch takes over as master without needing to reboot.

On the surviving switches - normally L2 forwarding should be almost hitless.  L3 too, if you enable NSF in your routing protocol.

Some gotchas: If your backup doesn't have the same feature set as the original master, you may lose active features.  Normally a new master uses its own mac for any gateways.  It will send a gratuitous ARP, but if a host doesn't recognize it, it will send its traffic to the MAC in its local ARP table.  There a command to continue to use the original master's ARP or you can use HSRP which uses a virtual MAC.

PS:

Original series 3750 used StackWise and E/X series use StackWisePlus, which is better in several way although it will work with original StackWise (yielding some of its improvements).  The new 3850s use a new Stack that will not work with any 3750.

View solution in original post

14 Replies 14

Reza Sharifi
Hall of Fame
Hall of Fame

If I lose or reboot the master will my secondary switch reboot itself to become the new master?

If you reboot the master, the secondary will take over right a way without any reboot. Another word, the second switch will not reboot itself.

Remember, if you want to reboot only the master, you need to use the slot number in your command:

reload slot 1

In this case, I am assuming your master switch is switch 1.

HTH

This is a little contradictory to information here http://www.cisco.com/en/US/products/hw/switches/ps5023/products_configuration_example09186a00807811ad.shtml#election

When is the stack master elected?

  • When the whole switch stack is reset1
  • When the stack master is reset or powered off Note: If you reset the stack master, it would reset the whole stack.
  • When the stack master is removed from the stack
  • When the stack master switch has failed

I understand this is a 3750 not 3750x but I was under the impression there was no functional difference between the two stacking technologies in this case either.

I have not tried this before, but I think, the documentation is correct.  If you reload the stack master, it will reboot the whole stack, but if you want to reload only a member you can use the slot number. So, the use of slot number may not work for stack master.

HTH

Reza Sharifi
Hall of Fame
Hall of Fame

Edit:  Also can anyone confirm that this functionality is different on  the 3850 line of switches?  I believe there would be little to no  downtime in a 2 switch 3850 stack with Active failure since the standby  would just assume the role.

When it comes to how the stack functions, there is no difference between a 3750 stack and a 3850.

HTH

Leo Laohoo
Hall of Fame
Hall of Fame
If I lose or reboot the master will my secondary switch reboot itself to become the new master?

This is correct.

If so, I assume this will create ~2 minutes of downtime as the secondary switch reboots and becomes functional again.

This is incorrect.  The "downtime" will entirely depend on how the clients are connected up to the redundant stack and how those links are configured.   The ARP table is shared among all the switches.

In our case, if I continuously ping the management IP address of the switch stack and power down the stack master, I would loose a few pings. 

Same theory as to the 3850.  But I am thinking the 3850, being a multi-core, would react alot faster compared to a single-core 3750X.

Leo, Joe, Reza

Sorry to resurrect such an old thread but I am refreshing my 3750 knowledge and the information given here seems slightly contradictory.

So Leo when you say it is correct that the secondary would reboot itself but the downtime depends on how the clients are connected I don't follow that ie. if the master goes down and the secondary reboots it doesn't matter about arp tables or anything else, there are no physical switches up so you have to wait for the secondary to reboot.

The guides suggest if you reset the master switch then the whole stack is reset which obviously means no forwarding of anything until the switches have reloaded.

Whereas if the stack master become unavailable it does not require any of the other switches in the stack to reboot and one of the other member simply becomes the stack master and during the election traffic is forwarded normally.

So is that how it works ie. a reboot command on the master will result in a complete reload of all the switches in which time there is obviously no forwarding of data.

Whereas if the master simply stops working ie. crashes or a failed power supply for example then a new master is elected with little or no impact on traffic forwarding.

Sorry for the basic questions but it's been a while :-)

Jon

 

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

"Whereas if the master simply stops working ie. crashes for example then a new master is elected with little or no impact on traffic forwarding."

That's an "it depends".

If stack is running just L2, I understand remaining switches will continue to forward while new stack master takes over.  You could have an issue if there are L2 topology changes until full stack master recovery.

If stack is running L3, routing protocol neighbors may drop the node until stack master recovery resumes routing protocols.  However, if NSF is enabled, between stack and neighbors, remaining stack members will forward traffic, and routing neighbors will "route" to/from stack, while L3 is off-line.  (Of course, if there's a routing topology change, "no one is home" on the stack to process that until the stack master is fully functional.)

As noted in my original post, if the stack's MAC changes (the default) that can cause some issues too.

Regarding resetting just the stack master, I thought behavior would be like a power cycle on that stack member, but don't quote me on that.

Joe

Thanks for that.

Yes L3 could be an issue although as you say NSF can help here.

I was more concerned with how the rest of the stack behaves when the master is reset as opposed to just fails.

It sounds like a reset actually resets the entire stack but then I was confused with Leo's response when he said it would but you wouldn't get much downtime.

I suspect I just misinterpreted what he meant.

Jon

Hey Jon, 

 

Ok, let's presume a stack of four switches.  The first switch is the master & all your clients are dual-homed to different members of the stack.  Switch 2 is the next stack master candidate and switch 3 is the next candidate (if switch 2 fails), etc.   All uplinks are multi-homed (including Etherchannel). 

 

Say switch one fails:  Switch 2 takes over immediately.   You'll loose a few pings when this happens and if you're telnet/SSH into the logical switch, you'll experience a lag (depending on the IOS level and the size and shape of the network) of 1 or 2 seconds.  

So Leo when you say it is correct that the secondary would reboot itself but the downtime depends on how the clients are connected I don't follow that ie. if the master goes down and the secondary reboots it doesn't matter about arp tables or anything else, there are no physical switches up so you have to wait for the secondary to reboot.

What I meant to say is that if you've got a client single-homed and that physical switch fails, then you'll have downtime.  There are also "arguments" in regards to dual- or multi-homed clients and how their Etherchannel is configured but I'm just being pedantic.  

I was more concerned with how the rest of the stack behaves when the master is reset as opposed to just fails.

A lot of scenarios with this.  First one is when you've got a pair of faulty stacking cables or one stacking cable and sends the master into a rarely-seen "split brain".  No way to recover this except to manually pull the plug, literally. 

 

This is probably one scenario you'll be interested.  Another scenario is an IOS bug activated by faulty configuration or network instability which can cause the master switch to go "nuts" (either caused by an IOS bug or different stack member models).   Very hard to determine because you won't know which stack member you're actually in until you issue the command "sh switch detail" and/or "sh version".  The "sh switch detail" will explain which stack member is currently master and the "sh version" will tell you (subtly), if the master has failed-over due to something sinister.  

It sounds like a reset actually resets the entire stack but then I was confused with Leo's response when he said it would but you wouldn't get much downtime.

Depends on the command.  Switch Stacking takes a page off the 6500 and you can reload each individual stack member (including the stack master).  When you reload the stack-master only, you'll see a few seconds "lag" as the next stack-master candidate kicks in and takes over the stack.   And again, when this happens and you've got a client dual- or multi-homed, you shouldn't be getting a lot of downtime.  A few lost pings and that is about it, however, I can't say the same with a lot of clients running different OS or application which are sensitive to a hit like this.  

 

A good example, I've been hitting a "bug" (of sort) which affects my stack of 2960S.  The bug will only hit the 2nd member of a stack but will never show itself when there is only one stack member.  When this happens the 2nd member of the stack will crash and recover.  The stack won't see it.  The clients only in stack member 2 will see it but the rest of the clients in different stack members won't see the "crash".  

Hi Leo

Thanks for getting back to me, was just about to log off.

So I may have more questions later when I reread your post but when you say it depends on the command and you can reload the stack master without it reloading all the other switches in the stack what exactly do you mean ie. the docs say if you reset the master that also resets all the members but obviously it is not as simple as that.

So lets say i wanted to reload the master switch for some reason, is there a specific command or something extra I need to do to make sure other member switches in the stack do not reset as well.

I am assuming reboot, reload and reset are in effect the same thing here, is that correct ?

Jon

So lets say i wanted to reload the master switch for some reason, is there a specific command or something extra I need to do to make sure other member switches in the stack do not reset as well.

Before reloading the stack master, one must first identify WHO is the stack master.  The command "sh switch detail" will identify the stack master.  Then to reload the stack master, just issue the command "reload slot <NUMBER>".  Again, not trying to be a d1ckhead here but I will presume the stack has multiple (and working) uplinks.  

 

When the stack master reboots, the next stack candidate (determined during the stack bootup) will immediately take over.  The old-stack-master will join the stack and, in order to keep harmony to the network, will join the stack but not "take over" as stack master.    Who's going to be the next candidate stack master will determine by two factors:  The MAC address and (if smart) the switch priority value.  

I am assuming reboot, reload and reset are in effect the same thing here, is that correct ?

Yes.  

Joseph W. Doherty
Hall of Fame
Hall of Fame

Disclaimer

The  Author of this posting offers the information contained within this  posting without consideration and with the reader's understanding that  there's no implied or expressed suitability or fitness for any purpose.  Information provided is for informational purposes only and should not  be construed as rendering professional advice of any kind. Usage of this  posting's information is solely at reader's own risk.

Liability Disclaimer

In  no event shall Author be liable for any damages whatsoever (including,  without limitation, damages for loss of use, data or profit) arising out  of the use or inability to use the posting's information even if Author  has been advised of the possibility of such damage.

Posting

As also noted by Leo, losing the stack master doesn't create much downtime (if any).  Another switch takes over as master without needing to reboot.

On the surviving switches - normally L2 forwarding should be almost hitless.  L3 too, if you enable NSF in your routing protocol.

Some gotchas: If your backup doesn't have the same feature set as the original master, you may lose active features.  Normally a new master uses its own mac for any gateways.  It will send a gratuitous ARP, but if a host doesn't recognize it, it will send its traffic to the MAC in its local ARP table.  There a command to continue to use the original master's ARP or you can use HSRP which uses a virtual MAC.

PS:

Original series 3750 used StackWise and E/X series use StackWisePlus, which is better in several way although it will work with original StackWise (yielding some of its improvements).  The new 3850s use a new Stack that will not work with any 3750.

Thanks all for the info but I think I am going to have to re-test this scenario.  I vaguelly remember testing this but seem to recall doing a reset on the master switch because they were also in a power stack.  Would a reset on the master switch cause the entire stack to reset leading to a few minutes of downtime?  I may have to break the power stack to simulate a switch failure.

Rob

Disclaimer

The   Author of this posting offers the information contained within this   posting without consideration and with the reader's understanding that   there's no implied or expressed suitability or fitness for any purpose.   Information provided is for informational purposes only and should not   be construed as rendering professional advice of any kind. Usage of  this  posting's information is solely at reader's own risk.

Liability Disclaimer

In   no event shall Author be liable for any damages whatsoever (including,   without limitation, damages for loss of use, data or profit) arising  out  of the use or inability to use the posting's information even if  Author  has been advised of the possibility of such damage.

Posting

If you lose the stack master, you shouldn't be seeing minutes of downtime.  Again, that's for traffic forwarding.  Access to the management IP of the stack itself, might be a little longer.

The gotcha for a PowerStack, if you lose enough power, another stack member might go down too.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: