adding a VLAN to MST would cause spanning-tree loop

Unanswered Question
Feb 15th, 2008

Hi All,

I have a simple topology made of a couple of Cat6500 switches (native IOS) connected between them and to a Cat3750 (stack of two). I have MST running. When adding a new Vlan it gets propagated to all the switches via VTP, when I assign it to a MST instance on the first switch (on the other switches it is still assigned to instance 0) a spanning tree loop occurs, connectivity to the switch is lost and the only way to recover from this situation is to manually break the loop by disconnecting a cable. After breaking the loop and assigning the new vlan to the correct MST instance the cable is put back in place and everything works fine. Any idea why this would happen?

Many thanks in advance.

Cheers.

Fausta.

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Francois Tallet Fri, 02/15/2008 - 13:50

Hi Fausta,

There are two operations. Adding the vlan into the network. This one should not cause any change to the network (MST does not see the event). The vlan is already assigned to an instance (for you, instance 0 it seems), thus a state is already computed for this vlan on all the ports in the network. If there is a loop at that stage, it's very likely to be a stupid bug on a switch that was supposed to be blocked for instance 0 and that failed to add the new vlan in the correct state on the blocked port.

When you are changing the vlan to instance mapping (by moving it from instance 0 to instance X) you are going to change the topology of your network because you are moving this switch to a different region. The switch should block all its ports and restart MST, with a different configuration. Thus new boundary ports, potentially new CIST information etc... will be created. This should disrupt the traffic for few seconds, but it should definitely not introduce a bridging loop. It seems that the neighboring switches have reacted in a wrong way to the changes on this particular switch. Of course, I cannot tell you what happened with that little information, but if you happen to have a relatively small network (this cat6k and two 3750), it should be possible to reproduce this problem. I'm not supposed to recommend that, but if you don't have an outrageously old release of the IOS, I would suggest that you open a case and have a bug open for this issue. It's very important for us to make sure this kind of problem does not happen. I apologize for the trouble it has caused you.

Regards,

Francois

fausta.t Mon, 02/18/2008 - 03:37

Hi François,

thanks for your answer. The IOS version we are running are not that old. We have the following IOS files running : s72033-adventerprisek9_wan-vz.122-33.SXH.bin on the two Cat6k and c3750e-universalk9-mz.122-35.SE2.bin on the Cat3750E.

We have scheduled a maintenance window for tomorrow night (feb 19, 8 p.m. CET) and we will further investigate the issue. Hope I will be able to gather more information. Cheers. Fausta.

Francois Tallet Mon, 02/18/2008 - 10:16

Thanks Fausta,

During your test, if you can, check the state of the instances to which the vlan is associated. In your setup, hopefully, there must not be many physical loop that STP has to break. If you have time to check which port should be blocking and see if one is not, that would really help. You have two or even three instance to monitor:

- the CIST.

- the instance to which your vlan is mapped in one region.

- the instance to which your vlan in mapped in the other region

(practically, region is almost equivalent to bridge in your setup).

If you get a permanent bridging loop, this is in fact good news from the troubleshooting perspective, as it will be much simpler to reproduce and fix.

Thanks and regards,

Francois

fausta.t Wed, 02/20/2008 - 04:15

Hi François,

last night we further investigated the issue. First of all: the problem observed last week was caused by malfunctioning a 10G card. After replacing (better said, restarting) the card and verifying that it was functioning properly we created a new Vlan to see what happened. I can confirm that adding a Vlan to MST has unpredictable consequences until the L2 topology is not coherent across all switches. When we create a new Vlan it is propagated by VTP and assigned to instance 0 across all switches and everything still works fine. After assigning the new Vlan to one of the MST instances on one of the switches and until this information is not manually propagated to all of the switches the behaviour of the spanning tree is unpredictable, root is elected regadless the configured priority, and this can cause a switch to become isolated from the rest of the network (this means that you may need have to have console access to configure it). Is this the correct beahviour? Is there a way to avoid this? I can understand a momentarily traffic disruption due to MST recalculation but having to be phisically close to the switches in order to access them via console in case they become isolated can be a problem sometimes.

If this is the correct behaviour I think we will have to plan a maintenance window to assign all possible vlans to the three configured instances in order to prevent future problems.

Cheers.

Fausta.

Francois Tallet Wed, 02/20/2008 - 11:00

Hi Fausta,

Thanks for getting back with more information on this. It seems that you are not experiencing the loop any more, that's already a great step forward;-)

In theory, MST should maintain connectivity. So even when you are playing around with the MST configuration and STP is basically restarting from scratch, that should only introduce a temporary outage and fall back to a final configuration where you have connectivity. So losing permanently connectivity should not happen (again, I'm interested in getting the scenario. If the loss of connectivity is permanent, we should be able to pinpoint who is doing something wrong easily).

I can see however a reason why the loss of connectivity could be permanent. It would be if some vlans are not allowed on all the trunks. For instance, you said that you lost connectivity to some switches. So you must have a management vlan that got somehow blocked. When you reconfigured your network, you basically went through lots of different (stable) topologies. It is possible that you ended up in a topology where STP was attempting to use a trunk where the management vlan was not allowed. That's something MST cannot really protect against. That would happen if say instance 1 is forwarding on a trunk. Vlan 10 is mapped to instance 1, meaning that MST considers that there is connectivity through the trunk for vlan 10, but vlan 10 is not in the allowed list for this trunk. The final topology (computed when all the bridges have had their MST configuration update), might not have this problem, which would explain why you would only experience it during the configuration.

Regards,

Francois

Actions

This Discussion