Today we ran into massive problems, probably because of a mistake of ours linked to VTP Pruning and VTP Transparent.
Our network basically looks like this:
edit: as ascii-graphics do not really look nice in here, i put the network layout into the attachment. sorry for the inconvenience
We already have 64 Vlans, meaning that more Vlans result in too many SPT-Instances on the Cat2950 (and therefore result in no spanning-tree vlan xxx statements, which we do not really like...), although these Switches provide access only to a few Vlans.
In order to facilitate the creation of new Vlans, we decided to put the Cat2950 (Access-3-6) into transparent mode. Then we could - so we thought - create new Vlans on the Core and use them in Access1&2, and Access3-6 would never hear of it, therefore neither try to run another SPT-Instance nor produce a loop (and yes, MST would have been the better choice, but we did not have enough time to learn and implement it).
5 Minutes later, some crucial Services were down: Servers in Vlan17 on Access3 could not talk to Servers on Access5 anymore. No ping, not even ARP-Replies were seen. But: The outside world could talk to the same servers without any Problem - be it the Core-Switches itself, Clients on User-Access-Switches or other Servers in Vlan17 on Access1&2. We did not test the connection between Access3&4, or 5&6, but as those are the SPT-Blocked-Links, we guess that no connection would have been possible as well.
Fortunately, it was late Friday afternoon ;)
When we switched the 2950 back to VTP Client, everything was fine again, except that we could not really provide any explanations to helpdesk & management. That of course is the reason for this post - why did this happen?
-In our understanding, VTP comes into play when adding/deleting/changing a Vlan. We didn't do any of this.
-When leaving the trunks, packets have their Vlan-Tag inside, ignoring any VTPs. As long as there is a virtual Port on the Switches, it gets forwarded through the dot1q-Trunks, wether the Switch is Client or Transparent.
-If we take VTP Pruning into account: If Access3-6 do not report, that they use Vlan 17 (as they do not report anything because their VTP Transparent), does this mean, that the Core prunes those not-reported-Vlans? Does it not work the other way around, that the Access report, which Vlan they do not need, and only the negativly-reported Vlans do get pruned?
-Even if Pruning was the reason for the outage - why could the Servers on the VTP-Transparent-Switches in the probably pruned Vlans still be reached from outside, be it from the Core or outside Vlan17 on Switches which were still VTP Client?
Does anybody have an idea what could have been wrong, so we could make solid statements in our Report? Or should everything have worked out as intended and the timing of the Server Problems was just a very unlucky coincidence (please, please, tell me so ;)?
RTFM-Answers are welcome if the right Manual is pointed out - I couldn't find anything about the Interaction of VTP Transparent and VTP Pruning, especially about the Question 'Prune whatever is reported not to be used' versus 'Prune whatever is not reported to be used'- even if that means, that everything gets pruned when nothing is reported from a VTP Transparent Switch.
And does anybody have a good suggestion how we could add more Vlans to our LAN without disturbing the Cat2950 or a big migration of the whole LAN?
Thanks for all comments
Greeting from Switzerland
(and as usual, sorry for the incomprehensible english - I just hope it is understandable nevertheless)