Re: RSTP Root bridge change

patrick.guerin · ‎10-11-2006

With rapid spanning tree once root bridge is elected (either manually or automatically) affect of topology changes on stub bridges will be limited - 3 secs to move from blocking to forwarding etc.

Just wondering what would happen if the root bridge is lost (e.g. primary distribution rebooted) and a new root bridge needs to be elected. Do the timers revert back to normal STP (30-40 secs) while the network is trying to stabalise and elect a new root bridge - same applies when the original root bridge comes back on the network after the reboot. If secondary root is manually prioritised then the failover and taking over as root bridge should be seemless. However I am unsure as to what happens when the original switch comes back on line.

Thanks

Pat

mheusinger · ‎10-11-2006

Hi,

one of the big advantages in RSTP is the fast convergence. This is among other things mainly achieved by using BPDUs locally between two switches to negotiate port states.

So in case there is a new root bridge introduced into a SPT domain, it will first negotiate with the directly attached switch that it is root and has a designated port. The neighboring switch will choose its port to be root port. once this is happening, all other SPT ports on the neighboring switch go into sync, i.e. are discarding and negotiating with their respective neighbor switches.

The whole process is much faster than classical spanning tree, though depending on your topology brief outages for some traffic might occur.

For a nice and more elevated explanation have a look at:

"Understanding Rapid Spanning Tree Protocol"

http://www.cisco.com/warp/public/473/146.html

Hope this helps! Please rate all posts.

Regards, Martin

patrick.guerin · ‎02-19-2007

Hi all,

I am resurrecting this post as we had another impact on another network as a result of loosing the root bridge.

Scenario (see attachment to help)

We have RPVST throughout, and we elect through configuration a root bridge and root secondary per vlan (Distribution a and Distribution b).

Root bridge (Dist a) reboots as a result of an exception

- new root is elected as per configuration (Dist b). All this is seamless and non-impacting to the customer as transition is very quick as per RPVST timers.

The problem arises when the pre-configured root bridge (Dist A) returns to the network topology after its reboot. Now we see a significant impact as the time it takes for the whole network to converge is significant and does have a knock on affect on dependant systems.

Question is - Once Dist A comes back on line is there a re-election of the root bridge as per normal STP timers, hence the impact?

This scenario does not seem to come up in the CCO documentation.

If it is the case then it would be better for Dist A to stay down until we get a maintenance window to reintroduce it to the network.

Any thoughts on this - is this what is expected and mitigation steps to stop Dist A coming back on the network after a reboot.

Thanks

Pat

Francois Tallet · ‎02-19-2007

Hi Pat,

Do you mean that the link between the resurected root and the access bridge is doing a slow transition to forwarding? This is not normal. As to RSTP rules of operation, the root should propose, the access should block its previous root port then send an agreement which should unblock the designated port on the root bridge. All this should be achieve with no timer intervention.

Check that your link between the root and the access bridges is really seen as point-to-point by STP (you'll see p2p on the show spanning-tree output) and that, of course, Rapid-PVST is running on both boxes.

If you did not observe the state of this particular link during the problem and just report a slow convergence from a host perspective, you should also check that access ports on which L3 devices are attached to the access switches are configured for portfast.

Regards,

Francois

patrick.guerin · ‎02-19-2007

No, the new root secondary to access forwards straight away so loosing the primary doesnt have an impact. All is configured as expected - root has all designated ports and P2P links to the access. Secondary root is configured and ready to assume root bridge.

The problem only arrises when the original primary distribution comes back on line after a rebbot- it is configured as the root primary so the redundant switch switches control back to the primary. This operation is what is causing the impact. It appears that it is operating back to normal STP timers (listening, learning, forwarding)i.e. not very rapid !!

I will go ahead and test in the lab but I was wondering if it is expected behaviour. If it was we would not want that primary coming back on the network. It would be better that it stayed down as we have some very sensitive clients that do not like an extended disconnect from the servers.

Section 3 of the diagram is where the extended impact arises.

Francois Tallet · ‎02-19-2007

I was commenting about step 3 when saying that it was not normal. The ports between the new root (old root re-inserted into the network) should go back to forwarding in the steps I described. Please, get some information on those two ports (some show spanning-tree detail on both side during the problem), in order to get a better picture of the problem.

There is no way of preventing a newly inserted root bridge from triggering a recomputation of the tree.

Regards,

Francois

patrick.guerin · ‎02-19-2007

Thank you Francois.

I will need to test it all out in the lab.

It happens across multiple access switches all running rapid spanning tree which are P2P to the root and it has happened on two different (but architecturally identical) networks in recent months.

As you say I would have hoped that the new root (old root re-inserted into the network) should not have reinstated normal STP timers.

When you say "a newly inserted root bridge triggering a recomputation of the tree". What timelines are we talking about, roughly, and is this not the scenario we are looking at in step 3? We saw a 2-3 minute impact.

The other variable is that when the original distibution switch (A) comes back on line the access switch will receive a lower bpdu from the root so will transition the newly forwarding port back to blocking. If the root switch is not fully back on-line there may be an extended impact until things stabalise?

Francois Tallet · ‎02-19-2007

Hi Pat,

Yes, re-inserting a root bridge is exactly step -3-. In fact, RSTP should converge faster when inserting a new root than when removing the current root bridge;-) In term of convergence time, even in a properly configured RSTP network, removing the root bridge is the only event that could lead to non-optimal reconvergence time. In the particular case of your design (which is a very common case hopefully), removing the root bridge does not have much impact because the root bridge was part of the physical loop (distribution A, distribution B, access switch).

Anyway, introducing a root bridge should not cause a 3 minute outage. In your case, I would expect sub-second recovery. From the access bridge point of view, it's a switch over between an old root port and a new one. There is no timer involved. I don't understand your statement about the access port receiving a lower bpdu and transition the newly forwarding port back to blocking. The access switch should start receiving better bpdus on the link that is coming up to the new root. It should elect this port as root. This new root port will go forwarding as soon as the old root port is discarding. Then an agreement is sent to the root bridge so that its designated port goes to forwarding immediately.

Regards,

Francois

patrick.guerin · ‎02-19-2007

The way you describe above is what I was getting at.

At least I now have confirmation on the way things should happen.

Hopefully I can replicate in the lab and find out what is going on.

Thanks for your help.