Tuesday last week I was on my way to work and I got a call on my mobile from somebody asking where I was as the whole network was down. I was already making my way to our data centre building so decided that was probably the best place to be to look at it. Sure enough when I got there it wasn’t happy. I plugged into the console on one of the Core switches and noticed MAC addresses flapping around, I managed to track down the problem to a common work area where somebody had plugged an unmanaged 3COM switch into the network which in itself hadn’t caused the problem but they had then plugged a cable from port 3 in the switch to port 4 in the switch creating a nice loop. So I pulled it out and hey presto it was all good again. In all there had been an outage for around 30 – 60 minutes.
Our business offers a lot of services to clients including some website hosting and this outage seemed to affect the whole network including these services, which are located in different VLAN’s.
Our data centre building also has office space as some of our employees work from there.
When I started here around a year ago I didn’t like how the Data Centre building was linked with the Main Office which also hosts some redundant, clustered and load balanced systems with the DC. Between the two sites we have two separate BT LES 100 circuits but only one of them is ever used the other is only plugged in at one end and is a manual switch over if the active one fails, I’ve been planning to set an etherchannel up but to date haven’t had the time. Each end of the LES circuits are switches and the circuits plug into trunk ports so ALL the vlans are trunked between both sites.
Now during a review of what happened management have come up with the idea of having a physical separate network infrastructure for our users at each site so they would need to pass through a firewall to connect to anything.
I don’t really like this idea as we’re going to have to manage multiple infrastructures and there must be a way to have one main infrastructure but still securing the Data Centre from this kind of outage.
The reason they plugged the LES 100’s into trunk port’s was because they required Layer 2 connectivity between the sites as they run Server Clusters both Microsoft and Oracle.
I was hoping somebody may have had experience in this type of setup and could help me out or make some suggestions so we can secure the Data Centre subnets/VLAN’s from out user VLAN’s/Subnets. We obviously have firewalls to separate the subnets/VLAN’s but the switching infrastructure is shared.
We have been told we can spend money to resolve this, as it is really important that we can show our customers we have a sound infrastructure with built-in resilience.
Also we are a Cisco house when it comes to switching and routing but our firewalls are not Cisco.
There is a diagram included if it helps.
Thank you for taking the time to read this and I hope I have included enough information.
Apart from port security here are some other things to consider -
1) only allow vlans to switches that need to be allowed.
2) stop using vlan 1. Vlan 1 by definition spans you entire network so any STP issue will affect all switches.
3) You may want to consider doing this -
you only need L2 adjacency for the cluster. So leave the links as L2 trunk between the 2 sites but only allow 3 vlans on it -
1) the first vlan would be the server vlan for clustering. If this is vlan 1 you really do need to segregate the servers
2) route the rest of the vlans between sites. To do this you can use the other 2 vlans for L3 peering ie. OSPF/EIGRP. All the other user vlans are now routed across the LES links and not switched. This would limit STP failure. And obviously if you have a server vlan between the 2 sites you can control which ports are allocated into this vlan ie. no user should be able to connect devices into the server vlan. If they can then you do have real problems.
You can only do the above if you setup vlans for users/servers. If you don't then port security is your main fallback.
Other than that you could look at L2TPv3 and EoMPLS to extend a L2 vlan across routed links and make the links purely routed links between the 2 sites.