I was wondering if anyone has come across situations where for some reason or other you have network instability which appears to be caused by a loop - despite measures in place to detect and mitigate.
We have had a couple of these which is concerning.
A summary of one of our affected areas:
2 x 6509 Sup720-3B acting in the distribution layer.
On the access layer there is a mixture of 3750, 4500, 4948 and older devices such as 2950 , 3550 and 5000.
Number of Vlans configured is over 68.(I state this because the spanning tree instance limit on a 2950 for eg is only 68 vlans, when you cross over the 68 Vlan limit the 2950 will disable spanning tree on a number of vlans over 68.
We also use pruning configured by VTP.
For this reason and while we wait for budget to replace the legacy devices the redundant links have been disabled. Leaving only the redundant links for the 4500.
We are using HSRP for gateway redundancy and RSTP for loop prevention.
When I say instability I mean when the devices start reporting duplicated IP addresses (the 6500 vlan gateways) , the HSRP states constantly switching from active to passive.
In the past three network instability incidents the diagnosis has been as follows.
1) Illegal hub connected to network which once disconnected the instability ceased.
2) Dual uplinks to 2950 , once the redundant link was removed the instability ceased.
3) After all segments of the network were isolated and instability remained the only two switches remaining were a 6503 - on one leg, and a 4948. Remove one link from the 4948 and the instability remains, unplug both interfaces and restore and instability ceases. Now I have had this once before with a 4948 when for some reason whenever it was plugged onto the network it seems to cause instability. This was swopped out by support as a hardware fault.
Now the thing that perplexes me is the following:
A loop should only be able to exist when you have multiple paths.
Question: Theoretically can a fault in a single device with one uplink (Trunk) cause a loop by sending out the same interface(internal loop)
Instability can be caused by high CPU on Distributions. I have tested this on a 6500 by sending up to 100Mb of broadcast traffic to the gateway which shoots up the processor , and has same effect of hsrp states changing(as if it misses the heartbeats and bpdus) - For this reason we include storm control on the 6500 trunk interfaces although the 6500 (IOS) has limited storm control functions and zero alerting.
Question: Even if a port is blocked by spanning tree is it still possible for the switch to receive traffic which it will process (CPU)