I recently managed to migrate everything (200 racks) on a VSS cluster. At the moment all the racks except 1, are running only on one link, connected to switch 1 - Active switch.
I provisioned a new rack, and after the migration was completed, this is the first rack that I wanted to connecct to the VSS cluster, using both links.
When I connected the cable to switch 1, it generated a spike of 100 to 400 mbps, to all the other edge switches (non cisco) connected to switch 1. This spike lasted for 1 to 2 minutes, and then everything went back to normal. I included the rrd graph below.
I am really surprised that this happened, as the edge switch is configured with a port -channel, and the MEC is configured correctly as well.
Any-one else experienced the same issue?
What version of IOS are running in your VSS environment?
There are some nasty bugs in 12.2(33)SXI2.
SXI2a is pretty good. I would keep an eye on it to see if this happens again. Are all your edge switches connecting to VSS pair no Cisco?
I am going to add another switch soon, and I will check if the same thing happens. Yes, all switches connected to the VSS are non Cisco.
I am using mode on for the etherchannels. I know the best practices for VSS configuration does not recomment this, but this is how it's been done until now.
Just curious how would dual-active detection work in your environment since you have your EtherChannel as on and not desirable and since you are connecting your VSS to none Cisco devices. Per 12.2SXI command reference guide only PAGP is supported
dual-active detection pagp trust channel-group xx
There is no Dual active link configured, as the two 6500 chassis are connected via 2 x 10G links. 1 link on the supervisor, and one link on a 6708 linecard.
Very little chance of both links going down at the same time.
But that is not the issue at the moment. I have no idea what causes that traffic spike.
Just as an update, I added a LACP MEC to the VSS, and the same issue happened when cable number 1 was connected.
I will setup a SPAN session to see what is happening.
I solved the issue by filtering the VLAN's that are allowed on the trunk. Previously I allowed all vlan's through, and then connected the port. After a few tests, I captured the traffic that was sent and it was unicast, all of it.
After that, I filtered the VLAN's on the port channel, and this did not generate any traffic spikes. So the consequence is solved, but did not find the root cause.
Anyone has any ideas?
Thanks for the update. This is good info and another reason to allow anly specific VLANs needed on a trunk and not all VLANs.
Yes indeed. Especially when the cluster has to handle about 800 vlans. .
If all vlans are allowed on 250 ports, actually 500 ports on 2 6509, it creates one virtual instance per vlan per physical interface. This is where information about the STP limits comes in handy .Check the actual usage by using show vlan virtual-interface command.
If the STP version can take 400.000 instances then it is fine. But for pvst+ the limit is 18.000 per chassis if running in standalone mode, and per cluster if running in VSS mode. Details about the other STP versions are listed on cisco website.
Now one can imagine the switching CPU usage when the virtual instance limit is over 20 times higher!!! Total chaos...
So never allow all vlans on the trunks, because it will bite back one day.
However, I cannot understand what happens exactly when all vlans are allowed through on one trunk, and all the other trunks are filtered?.. why does it generate the unicast flood?