STP TCN avoidance mechanisms

briangormley · ‎10-23-2014

I have an environment that is a typical hub and spoke LAN topology i.e. multiple access switches all connecting back to the collapsed backbone/core. The core is the Layer 3 gateway for the various VLANs, and all access switches are merely layer 2 extending the various VLANs.

The core comprises of two physically redundant 3rd party (HP Comware 7500 chassis switches) in a logical stack. Similar to the Cisco Stackwise, VSS or vPC technologies.

All access switches comprise of two physically redundant access switches using the same technology.There are Cisco IE2000 and other catalyst 2960S stacks all connecting back to the core.

The physical connection between each access stack and the core stack is 2 fiber runs; This allows me to do LACP across the logical access switch as well as the logical core switch. So in summary it is very simply hub and spoke topology.

All devices are running MST - this is due to interop otherwise I would have gone PVST.

What I have observed is when a typical end point is inserted into the network access switch, an STP topology change notification (TCN) is generated. The result of this TCN is as per what is expected. The access switch notifies the root bridge (Core) and then the Root Bridge notifies all the other bridges; and that's when the packet loss starts.

Based on my understanding of STP and TC's , when a switch receives an TCN from the root it ages out its forwarding table - after which it needs to learn again. This process results in packet loss - more than I would deem normal/acceptable, and nothing like what I have seen before.

There is a lot of multicast streams running, and snooping and querieing has been setup correctly, but because of the nature of the streams this packet loss has an impact.

I also understand why STP TCN's exist and why they are needed; however I need to look at a mechanism to reduce these TCN's when access devices are connected/disconnected etc. So far:

Cisco devices: enabled portfast - I have enabled this where I can.

HP Comware devices: stp edged-port enable (equivalent to portfast) is enabled where I can.

Question 1: am I correct in understanding that a access port configured with portfast/edged-port will not be responsible for the generation of a TCN if it is plugged/unplugged/turned on/off at all? Or will a TCN still occur? I have a suspicion that with Cisco it doesn't generate a TCN but with the HP Comware switches it still does.

Question 2: based on this topology, what if I was to disable STP on the physical up-link interfaces on my access switches? they are configured in a LACP port-channel/bridge-aggr . I know removing STP anywhere is not a very good idea, but based on the hub-and-spoke design would it be feasible. My thinking is that I could still run STP on the local switch to avoid loops but turn it off on the uplinks. Although this may cause other issues, as one stand alone switch would think it is the root for a VLAN and versus others that think the same - I can start seeing inferior and superior root issues in my future :). Or possibly there is a way to stop STP from propagating across the up-links entirely.

Looking for any thoughts or feedback that may assist me in getting to a suitable resolution and a stable network. Especially my devices connected to the Cisco IE2000's (only 100Mbps interfaces) and when TCNs occur and tables are flushed, there is servere loss on those endpoints. Yee-ouch.

houtan haddadlarijani · ‎10-23-2014

Hi

Q1: Enabling portfast on an interface will not disable STP on it, in fact, PortFast immediately transitions the port into STP forwarding mode upon linkup. The port still participates in STP . So if the port is to be a part of the loop, the port eventually transitions into STP blocking mode. But you are right, Portfast enabled interface does not generate TCN.

Q2: Turning off STP has never been good idea, especially when you are using port channel. disabling STP on these kind of Up links can be dangerous. Consider because of some operational problem port channel members lose their bundle and one of them operates as uni-directional link, simoultaneously you have STP disabled, so bridiging will occurr. You can use RSTP instead to achieve fast convergence.

Cisco says: "A port that directly transitions to forwarding state on a redundant link can cause temporary bridging loops"

http://www.cisco.com/c/en/us/support/docs/lan-switching/spanning-tree-protocol/12013-17.html#portfastcommand

HTH

Houtan

Daniel90 · ‎01-02-2023

--TCN Overview--

I know this post is a little old post, but I would like to add onto this discussion for anyone joining. To my understanding, once the root switch receives a TCN, it creates a new configuration BPDU with a configuration change flag and floods it out to all the switches. When the switches get this BPDU with the topology change flag, it flushes out mac addresses for devices that have not communicated in that 15-second window but maintains mac addresses for devices that are actively communicating. So instead of mac address being saved for 5 minutes, they have the forward delay time (which is generally 15seconds). So I don't know if TCN is affecting your overall performance. This can be viewed by using the "show spanning-tree vlan 'number' detail". Look under the "number of topology changes last change occurred.

--Possible Solution--

This is a bit of an out their solution. So if the problem is focused on access ports that don't support portfast, why not focus on those specific ports. For example, instead of disabling spanning-tree on the uplinks, why not disable them on the edge ports? As you mentioned, this can cause problem if someone connects a switch and forms a loop, but this can be avoided by using port security on those access ports. I would find it hard to imagine that the dell switches don't support port-security. Is this an efficient option? No it's not, but it does ensure your uplinks still have spanning-tree enabled :)!