Hoping that some experts in network design will help out here. The scenario is this. We have approx 150 servers that we want to move to a data center. Each server will be dual homed. There is a possibility that iLO (LOM) will also be used although other options are being considered. I am looking for advice really on how to move forward in building a modular network design. The 150 servers will be part of the server farm module, however we will have several other modules that need to be integrated. The network will have WAN Links out to 2 main sites one of these being BCP. We will also have extranet connections coming inbound to the server farm module.
I am thinking about the following design:
2 x 3750G in each rack. Each server in a rack will be cabled to one of the 3750 (dual homing). Each switch will then have a fiber link back to an aggregation switch (3750 16SFP ports). This will be the distribution module. The distribution module will be comprised of stacked 3750 switches. The fiber links will be at Layer 3.
Am I on the right track here, is this design away of the mark? What do I do to create a core layer? What happens at the core? Do I hang the WAN connections of the core? Do I hang other modules such as the Internet module of the distribution layer?
Any advice would be rated!
The design is fine with Dist, Layer design, however with respect to the choice of switches, I would really like to know if you really needed the StackWise Technology? If yes, then only you should go ahead with the 3750 Switches otherwise 3560 is the excellent series for this purpose.
Coming to Core Layer, the 3560 would be better choice if you are not in a liklihood of using the StackWise Technology, and perhaps the port density wise I guess the 4500 would be costeffective solution.
So, you should cross check (if not done already) about 3560 vs 3750 at the Dist. Layer and 3750 v/s 4500 at the Core Layer.
Also, you could think of a collapsed Core/Dist Layer with 4500 only.
Please let us know about your final decision.
It really depends on how much you anticipate the network to grow. If there ain't gonna be that much growth, then you can use collapsed core-distribution.
Also, do you want to keep the WAN connectivity separate from the the other modules? How about the extranet connections? Do they come in via the WAN module? Or Internet? If the answer to these questions is something along the lines of "all modules will have to be separated from the others", then you might want to separate core from distribution.
You can configure your core with a couple of Layer-3 switches, running only Layer-3 interfaces. I.e. no VLANs. Or at worst use "point-to-point" VLAN b/w the two core switches, if you have to. For fault resiliency and more bandwidth, you can configure Gig EtherChannel b/w core switches if you like.
You can configure your distribution layer to be a hybrid layer-2/layer-3. Layer-2 towards the access layer, and layer-3 to the core.
As to what switches to use? Well... depends on the switch locations, cabling, performance, and port density (just to name a few!). If you need more than a dozen fibre ports on a layer-3 switch, then your choice is quite limited. Although Cat 3750s can be stacked, providing more scalability, we encountered a few issues with layer-3 convergence in stacked configurations. I can't remember exactly how long it took the routing protocol to reconverge within the stack. I think it was a few minutes.
Also, we have done a bit of geeky calculations on the TCO comparison b/w stacked 3750s and a 4506. We found that once you need to stack more than 3x 3750s, there's not much difference in TCO b/w them and a 4506. Hence, for server access switches, I reckon 4506 will be a better investment, especially when iLO becomes a definite requirement you can allocate a certain blade/linecard specifically for that purpose.
I guess I'll weigh in here. For your server connectivity is there any reason that you are using fiber between the the distribution and access layer. Fiber ports are much more expensive than Copper Gig ports. For scalability I like to limit the port density on the access layer to 24 or fewer, so I typically would recommend a pair of 3560 or 3750 gig switches running standard image. I typically would not stack these as stacking would reduce my bandwidth ration down from 24 to 1 to 48 to 1 on the uplinks. Stacking also kind of increases the size of the failure domain in that if there is a stack failure you could potentially take down both of the switches you have in the rack. In the Distribution layer, it may be overkill but I would recommend a pair of 6500. The 6500 would allow you to potentially utilize the services modules for NAM, FWSM, or CSM and others(the NAM is a very effective network Dashboard for your Data Center). With a data center of 150 servers it is very likely you would require such services. The decision to collapse the core into this Data Center Distribution Block will probably depend upon what the other modules are. My typical decision criteria on this is if there were a L2 failure and the server farm distribution block were down would there be any reason to have the other modules be able to talk to each other. If there is any value then you should not collapse the core and distribution layer.
It is very difficult in a Data Center to do L3 between the Access Layer and the distribution Layer. If your requirements will allow it then I would agree with a pair of stacked 3750 series switches as this will be your only option for teaming at the access layer if you do L3 between the access and Distribution. If you do L3 to the access layer you may lose some flexibility in the way you implement services modules or load balancers. Server placement in the Data center becomes important, as clustered servers typically must be on common subnets, and this kind of design requires that all servers requiring common subnets reside in physically the same rack.
Guys thanks for the posts, all posts so far have made some excellent points. I think scalability is a major factor in this network design. I have been considering the 3750G switch as an option at the access layer because of the stacking option. One of my major reservations about the network design is the fact that L3 between the access layer and the distribution layer means that chassis based server systems will have problems. One server in the blade system may need to be in a different VLAN to the others, with this design this would be impossible. Do you know of any good examples of data center network design? I really need to see what a server farm design looks like. The point about the collapsed core is very interesting if we lose the access layer then we should be able to reroute via the core to our London BCP site!
Huge amount of things to think about, it would be great if someone had some good examples to work from!
There are some great Data Center Design guides on CCO
here is a link to some of them. Although it is often touted I have yet to see a large scale Data Center running L3 to the access layer, it just takes away too much flexibility. Often the Chassis based server systems or Blade servers will have an integrated switch, the best practice is usually to home this integrated switch all the way back to the aggregation/distribution layer.
thats a good point about the integrated switch, due to our current network design we have used the patch panel option instead of the integrated switch!! Thanks for the link, I'll be reading some of these today..... maybe questions later :)
Can I just ask you about your statement
"I have yet to see a large scale Data Center running L3 to the access layer, it just takes away too much flexibility"
In what way does running L3 to the access layer remove flexability. Could anyone give some advantages to running L3 to tho the access layer? If we stay with layer 2 to the access layer could the advatages of this be pointed out?
As far as running L3 to the _SERVER_ access layer, you will most likely run into problems with clustering servers and server farm scalability in the medium to long-term. Even L3 to the floor access (i.e. desktops, printers, etc.) will also constraint you, when you're deploying IP Tel for instance, where in some cases the VLANs must span across multiple access switches (which you can't do with L3 access switches, obviously).
The only advantage L3 access provides that I can think of is: faster convergence upon uplink failure.
With the new advances in Spanning Tree protocol, layer-2 convergence is not a big issue anymore. In fact, it is even possible for Layer-2 convergence to take place in sub-second required by IP Tel. One important prerequisite to make this happen is non-cascading layer-2 switches. I.e. do not daisy-chain a layer-2 switch to another layer-2 switch. Other advantages layer-2 access will provide include:
1. Allows VLAN spanning multiple switches, required by some applications
2. Allows more scalability on the access layer. This can be quite important in a server farm where some clustered servers may be connected to a number of different layer-2 switches (with each L2 switch dual-connected to redundant multilayer distribution switches)
3. In server farm environment, allows "teamed NIC" connections between servers and access switches, providing fault resilience against single NIC/switch failure.
As far as rerouting to London BCP site when your server farm is unvailable at the primary site, I reckon the best way of doing it is by employing some kind of hardware load balancers (e.g. Cisco CSS and/or GSS series, Nortel Alteon Application Switch, or F5 GTM/LTM).
No matter which product you choose, they all operate in a similar fashion (50000-foot view):
1) Send regular keepalive to your servers (either discrete or clustered)
2) Forward traffic only to active servers
3) If servers in primary site are unavailable, forward traffic to alternate servers
You can configure pretty fancy stuff with these load balancers.