Attached is a JPG of our STP topology for VLAN6, our NFS VLAN. We are running 2 6500 switches as core devices in each of our 2 Data Centers running PVST. We have noticed saturation on the trunk (labeled "A" in the diagram) between the two Core Switches in the Data Center labeled "Data Center 1". We have multiple VLANs trunked between the switches, but we are focusing on VLAN6 in particular because we have noted that NFS VLAN6 traffic constitutes over 50% of total traffic on this trunk. This NFS VLAN 6 is strictly an unrouted Layer2 VLAN (one large broadcast domain).
Our plan is to run another trunk âBâ between the two core switches in DC1, and only pass VLAN 6 tagged frames over this new trunk. To accomplish this we are planning the following:
1) Configure new 802.1Q trunk between the two EDC 6513's and place each side in shutdown mode
2) Add VLAN 6 tag to the new trunk while leaving each side still shutdown
3) Simultaneously remove VLAN 6 tag from the existing trunk while 'no shutting' the new trunk link.
Since this NFS traffic is integral to our business functions, we need to reduce/eliminate the STP re-convergence ripple effect as much as possible. I am worried that once the TCNs are sent out, that they will cause a 50 second outage for VLAN 6 tagged frames throughout our whole campus.
My questions for you (many thanks if you have read this far BTW):
1)From the included diagram, are you able to identify the scope of the STP reconvergence once the VLAN 6 tag is removed from the existing link? Which switches would be affected by this re-convergence?
2) Would uplinkfast/backbonefast reduce the STP reconvergence due to the fact that the new trunk will be forwarding before the 20second max age timer hits?
3) Is there a better way to complete the configuration steps 1-3 that I have noted above that would limit the STP reconvergence?
Thanks in advance!!!
*** EDIT ***
6509-4's root port is incorrectly labeled. Ths correct root port is Gi1/21.
1. you are talking here about 6513, I see only 6509s in this diagram
2. PVSTP+ is PVST+ running RSTP, RSTP dosn't have the 50s convergence time anymore.
3. Why can't you simply add the 3 interface to the portchannel that you have configured already ?
4.I am confused by "STP extensions disabled" What dos this mean ?
5. AFAIK uplinkfast and backbone fast are built in features for RSTP (RSTP+ in your case) You do noe need to enable them.
6. I might have wronlgy assummed that link A is a portchannel. Are those just trunks ? If yes why can't you just configure the blocked trunk with a higer STP priority for VLAN 6 and it will become active only for VLAN 6
7. Playing with port priorities will work for the third link (B) you are trying to add. You can simple force the trafic for VLAN 6 to go over B by simply puting a beter cost/priority on that port. No need to remove VLAN 6 from A (you will have redundacy this way, B will be acvive and A backup port for the root switch)
I hope :-) all the above are correct and will help you
At a second reading I see that I was completely wrong. I had in mind RPVST+ while your picture says PVST+
I am sorry for that.
That's an interesting challenge;-) The goal is to add a trunk with minimal impact on
First, stage 3, I would not remove vlan 6 from the existing channel. Unless you have
tuned the cost, the channel is the preferred path between 6509-1 and 65090-2. Just
bringing up a redundant link between the two will not do any harm:
- the newly added trunk will block on 6509-1
- there will be a TC, but it will have no impact on the network - a topology change
has an impact on the network when the topology is changing;-) Here, it's only going
to reduce the aging time temporarily, which is fine.
From there, you could just remove vlan 6 from the channel on 6509-1. The blocked
port on 6509-1 will take 30 seconds to go back to forwarding. That long, but it's a
conservative way of doing.
Here are two other ways I can quickly think of:
-1- Enable uplinkfast on 6509-1. Before you do so, be careful that uplinkfast will
increase the bridge priority and the port priority for all vlans. Considering the
topology I see for vlan 6, that should not have any impact but you have to be
careful about the impact on all the other vlans. (btw, uplinkfast is not enabled on
vlan that have their bridge priority manually set, as far as I can remember).
-2- pay attention to where the blocked port is. Right now, the cost from 6509-2 to
6509-1 is X. When you replace the channel by the trunk between those two switches,
the cost that 6509-1 advertises downstream is going to increase. Make sure that this
will not change the location of the blocked port as it would be a slow transition
-3- do the switchover by removing vlan 6 from the allowed list of the channel on
6509-1. This way, 6509-1 will see its root port go away and immediately do a
transition to the trunk.
I'm assuming that I've not missed anything, so you might want to try this out
quickly in a lab before;-) This should lead to sub-second convergence.
I have another idea that is a simple variation of your initial plan. Keep the trunk
down on the 6509-1 side and configure portfast (yes portfast;-) on both side of the
trunk. Do the check specified in step -2- above. Switchover quickly between the
trunk and the channel.
-remove vlan 6 from the channel on the 6509-2
-enable the trunk on the 6509-1
-remove vlan 6 from the channel on the 6509-1
You need to remove vlan 6 quickly from both ends of the channel because we cannot
rely on the topology change mechanism to clear the cam table on the channel.
Hopefully no other cam table update is needed because the topology of the network is
not changing in fact. Also, be careful that "portfast trunk" is required, not just
Again, I don't guarantee anything, I've typed this email in 5 minutes so I recommend
you try this out in the lab first and let us know;-)
Thanks a bunch for the very informative post! You have outlined some things that we hadn't thought of, most notably 'portfast trunk'. I'm going to do some research in this ASAP.
We are going to model this out by creating a Layer2 test VLAN and see which option will work best for us. I'll be sure to post and let you know how we ended up doing this.
I wanted to thank you again for these suggestions you made in regards to this issue. I just finished modeling this out in a lab environment and found that a combination of 'portfast trunk' and 'spanning-tree port-priority' did the job of introducing a new preferred L2 trunk to the STP topo with NO CONVERGENCE (or very very minimal convergence, I should say).
The way that ended up working was to leave both sides of the new trunk shutdown, configure the spanning-tree port-priority of each side of the new trunk to be less than the default of 128 (multiples of 16), add spanning-tree portfast trunk to each side, and finally no shut each side. This resulted in very minimal convergence time (no dropped pings/Remote Desktop session stayed active) and seems like a great way to me to get RSTP+ like convergence times if you can't make the gloabl jump to RSTP+ for some reason. I will definitely be using it in the future.
As far as the 'live cut' of this, unfortunetly two of the 6513 core switches are CatOS. I don't see a portfast trunk option on CatOS, so I'm wondering if you think plain old 'portfast' might be OK? I plan on testing this out as well, but wanted to see if anyone else knew off the bat?
Remembering your problem, I am concerned by several steps you are describing here:
-1- Your test network might not be exactly equivalent to you live network.
I expect the channel to have a lower cost than the trunk you are adding. The priority will only be used by STP if the cost of the two
links are equal. Tuning the priority is a little bit more subtle but it is better because it will prevent 6509-1 to advertise a different cost after you switchover your link.
So in your live network, make sure you configure the cost of your trunk to be the same as the one of your channel. The priority only needs to be lowered on 6509-2 (but it will have no effect if you also change it on 6501-1).
-2- It seems that you switched over between the trunk and the channel by just doing a no shut on the trunk. You are exposing yourself to a short bridging loop. Basically, as soon as the trunk is operational, flooded traffic might loop between the trunk and the channel. The loop will end as soon as 6509-2 has sent a BPDU to 6509-1. That should be very quick, but you never know how your applications might react to duplicate frames for instance, so I would not recommend doing that. That's something that you would probably not see by just running a ping.
-3- you need to find a way of clearing the cam entries left on the channel. If you are not removing the vlan from the allowed list on the channel, some traffic could be black-holed. For example, in your lab setup, suppose that you have traffic from device 2 off 6509-2 going to device 1, connected to 6509-1. There is a cam entry on the channel on 6509-2 for device 1. If you do the switchover the way you described it to me, this cam entry will not be flush when the channel gets blocked by STP on 6509-1. As a result, traffic from device 2 to device 1 is still sent on the channel and black holed. You should remove the vlan from both ends of the channel. Note that you might not see the problem in your test setup because device 1 is also sending traffic to device 2. This traffic will have updated the cam table on 6509-2 so that an entry for device 1 appears on the trunk. But if you have some application that don't run permanent bidirectional traffic, they might be affected by the switchover until the cam table is cleared or re-learnt on the correct port.
At last, CatOS supports portfast trunk but only after a certain release (that I don't know on the top of my head). If your release don't support that, maybe you can add an access link instead of a trunk between your two bridges. That would not make any difference if you only plan to run a single vlan on it.
You are correct in assuming that the test environment is different from the live environment where this cutover will take place. I'll try to post an updated diagram, as this can;t be easy to follow with text. I labbed this topology out using 2 6509's with 2 trunks between them (not EtherChanneled) carrying all VLAN tags. Both 6509's had a trunk to a single 3750 which only carried my test VLAN traffic, VLAN 606 in this example. I brought up a third trunk between the 6509's which would only carry VLAN 606 tagged frames. The first thing I noticed was that when the new trunk (copper interface Gi9/48) came up, it went to blocking. I am assuming because the Priority number was higher than the existing trunk (Multi-mode fiber SFP Gi5/1). I also tested this out using an SFP interface in Module 1 as the new trunk. This time, the new trunk immediately became the root port for VLAN 606 and I was assuming this had to do with the lower Priority Number (This value shows up as Prio.Nbr - 128.xxx in a 'sho spanning-tree vlan 606'). Am I correct that this is how the root port is chosen in the event of a tie in path cost?
2 - Good catch on the bridging loop. Although the plan is to immediately strip the tag off the old trunk once the new one is up, I too am not sure how the apps will behave. Like you said, the ping and Remote Desktop session probably isn't the best way to assess the impact of this change.
3 - The more I think about it, the more I like the idea of using an access port for this. I'm just not sure my Manager will go for it, but I'm going to pitch it since we are only going to be running one tag over the new trunk. Both of these trunks are going to be replaced w/ 10GB when we forklift these CatOS 6513's for 6509E's this summer, so it will be a short term fix.
Anyway, once agian I appreciate your expertise on this matter!
i just curious once you have implemented the "portfast trunk" , so is this consider permanently or just temporary ?
your reply will be highly appreciated.
I planned on only using it temporarily until the new trunk is forwarding, then remove it. I tested removing it on a link and as expected, it didn't force a re-convergence.
Hi Chris, Jack,
Yes, the priority is only used as a tie breaker when the root bridge ID, root path cost and sender bridge ID are equal in the BPDU. In your scenario, that's the case if both cost are equal.
About the portfast configuration: yes, that's a temporary thing. The problem with portfast is that you can get temporary loops (just the like the one described in the previous post).
That's why the rule is to not configure portfast on links going to other bridges.
Generally speaking, by using STP for the redundancy in your data center, you've "signed up" for a reconvergence in the 30-50 second range. If you need faster convergence, you really need to consider rapid-pvst or MST (that can probably provide sub-second reconvergence in your case).