Network Instability despite loop prevention

Unanswered Question
Aug 18th, 2009


I was wondering if anyone has come across situations where for some reason or other you have network instability which appears to be caused by a loop - despite measures in place to detect and mitigate.

We have had a couple of these which is concerning.

A summary of one of our affected areas:

2 x 6509 Sup720-3B acting in the distribution layer.

On the access layer there is a mixture of 3750, 4500, 4948 and older devices such as 2950 , 3550 and 5000.

Number of Vlans configured is over 68.(I state this because the spanning tree instance limit on a 2950 for eg is only 68 vlans, when you cross over the 68 Vlan limit the 2950 will disable spanning tree on a number of vlans over 68.

We also use pruning configured by VTP.

For this reason and while we wait for budget to replace the legacy devices the redundant links have been disabled. Leaving only the redundant links for the 4500.

We are using HSRP for gateway redundancy and RSTP for loop prevention.

When I say instability I mean when the devices start reporting duplicated IP addresses (the 6500 vlan gateways) , the HSRP states constantly switching from active to passive.

In the past three network instability incidents the diagnosis has been as follows.

1) Illegal hub connected to network which once disconnected the instability ceased.

2) Dual uplinks to 2950 , once the redundant link was removed the instability ceased.

3) After all segments of the network were isolated and instability remained the only two switches remaining were a 6503 - on one leg, and a 4948. Remove one link from the 4948 and the instability remains, unplug both interfaces and restore and instability ceases. Now I have had this once before with a 4948 when for some reason whenever it was plugged onto the network it seems to cause instability. This was swopped out by support as a hardware fault.

Now the thing that perplexes me is the following:

A loop should only be able to exist when you have multiple paths.

Question: Theoretically can a fault in a single device with one uplink (Trunk) cause a loop by sending out the same interface(internal loop)

Instability can be caused by high CPU on Distributions. I have tested this on a 6500 by sending up to 100Mb of broadcast traffic to the gateway which shoots up the processor , and has same effect of hsrp states changing(as if it misses the heartbeats and bpdus) - For this reason we include storm control on the 6500 trunk interfaces although the 6500 (IOS) has limited storm control functions and zero alerting.

Question: Even if a port is blocked by spanning tree is it still possible for the switch to receive traffic which it will process (CPU)



I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
glen.grant Tue, 08/18/2009 - 01:46

Rogue hubs or switches are a major cause of problems , not because of the switch itself but inevitably some user will loop 2 ports on said switch together causing a major outage if the switch does not run spanning tree which most cheap switches do not because it causes a runaway condition on the vlan . Yes the 2950 does have limited instance of spanning tree available but that should not be a problem all that needs to be done is to manually prune what vlans is allowed across that trunk . you certainly do not need 68 vlans on a 2950 because it does not even have 68 ports on it .People tend to allow everything across trunk links , not a good practice. Manually prune on both sides on that link and that should cure that because the 2950 will only allocate spanning tree instances of the vlans allowed across that trunk .VTP pruning does not do the same thing as manually pruning off the vlans off the trunks . Multicast can cause problems if the box is not specifically configured to handle that type of traffic , i have seen a Sup 720 buried by a single multicast conversation by someone doing a ghost operation using multicast. If a port is in a spanning tree blocked state you should not be getting data traffic across the link for that vlan . Other things to check is speed/duplex mismatches on all your links where one end might be auto and the other end is hardcoded , in high traffic situations this could cause instability.

mironduplessis Tue, 08/18/2009 - 02:14

Hey Mate,

Yes the hubs are an issue and we remove when possible. The switch security features such as bpdu filter and guard do not seem to be 100% in detecting and error disabling these interfaces.

99% of the uplinks from the 6500 to the acces layer is via gig fibre so duplex/speed not really an issue.

The reason why we use pruning from VTP is because we run dynamic vlan allocation per site. So a user is able to roam the network and maintain there ip address and vlan. Using legacy cisco VMPS for this.In the future this will be removed as there is less requirement applications to restrict on IP address.

In general would you have bpdufilter disabled and bpduguard enabled. Because if you have someone connect a hub to a portfast port and loops it to anothe port you would want the bpdu that the switch sends to loop back to itself and shut the port, otherwise if you filter the bpdu from going out then you wont ever know you have a hub?

So then if anyone has checked all the obvious as stated above have they still experienced instability. BTW this has only occured about 3 times since the start of the year, and before that very seldom and could always be attributed to a hub.



george_daly Tue, 08/18/2009 - 02:43


I have seen a similar issue using RSTP when the STP domain exceeds 10k. Sounds like a similar environment, RSTP on the 6500s, high no of vlans + trunks.

Check the spanning-tree port count on the 6500s with 'show spanning-tree summary' and look at the number of STP Active entries. May well not be the problem but certainly worth checking.




This Discussion