I ran into some problems when connecting two access-switches
(2950) to our core switches (6509). Although we set the
core switches as spanning-tree root and secondary root the
access switches keep on doing there spanning-tree cycle, listning,
learning, blocking. The core-switches did had port out discard errors
on some ports.
What also made it curious is that the telnet connection became very
slow but the cpu load was as it was before when we didn't had any
Has anybody an explanation for this behaviour?
The core switches are connected to each other with a trunk, the access-switches are connected to both core-switches with trunks and not directly connected to each other.
If the access layer switches are each connected to both 6500 switches and you have a layer 2 trunk between your core switches then the access layer switches still have to go through their spanning-tree calculations as they need to block one of their uplinks otherwise you have created a layer 2 loop in your network. (Note that if you run PVST+ then you will block per vlan on the uplinks).
Setting the spanning tree root and secondary does not mean the access-layer switches no longer need to run spanning-tree, root and secondary is more about optimal layer 2 paths within your network topology.
Does this also explain the port out discard errors?
The weird part of it was that a interface who was already in blocking state again turned to listening, learning and again blocking, for an hour there wasn't a stable moment and there are only 20 vlan's on the access-switches.
How long should it then takes before I have a stable situation?
What command are you using to view the port discards ?
It should not take an hour for a stable environment. Spanning-tree should be come stable under normal situations in about 45 seconds. If you have a port that is continually flipping between blocking and forwarding then you have a potential loop in your network.
On the access layer switches have you enabled spanning-tree portfast on all the ports other than the uplink ports ?.
Have you checked the logs on the 6500's and 2900 to see if there are port flapping error messages ?
In addition to the switch configs,Please also send the " show spanning-tree details " and "show spanning-tree summary " from both the access and 6509 switches.Let us have a look at the details and will get back to you on this.
Kindly check the following.
1) The spanning mode in your tech support logs indicates that you are running rapid PVST+ in the core switches.
Hence check whether the same is turned on in the access layer switches.
2) Check the uplink interfaces on the access switches and the core switches for any layer 1 errors. If the interface/link is dropping packets due to interface/link errors, it will disturb the BPDU communication between the switches, causing the STP to reconverge continuously causing connectivity problems.
The symptoms that you have indicated on the original post is possible.
Intermittent/slow connectivity are all indication of an instable STP network.
What is the sup engine used in your 6509. It is quite possible that the Sup-Engine is not overloaded due to the STP reconvergence.
On lower end switches, you will notice a sharp increase in CPU, when there is a STP recovergence.
CPU Utilisation hike is not a consistent behavior for all STP related issues.
Trying to figure out your problem . Just because you set the root at the 6500's doesn't mean the access switches won't go thru spanning tree (listening,learning etcc.) when plugged into the network . It is possible for management to become temporarily slow while spanning tree runs but that should clear within 50 seconds . You should look around and see where your vlans are being blocked within the spanning tree for each vlan . From the access switches verify where the root actually is , if its the 6500's then the root port should be one of your 2 uplinks and not the access switch itself .
We've found a switch, unmanaged, contected to both access-switches for spanning reasons which had a low spanning tree priority which caused the STP problem.
We've also made a test setup where the problem did occur so it had to be something local. Curious enough, also at the test setup the slow telnet connection occured and still occurs. STP is stable at the setup.