According to Cisco's design documentation for the Data Center, the oversubscription ratio at the server access layer should be as close to 1:1 as possible -- maybe 2 or 3:1.
Now, to achieve that kind of low ovsersubscription ratio for, say, a 6509 that hosts 288 servers (6 48-port blades), one would need to have 80-Gbps of uplink bandwidth to the core (8 separate L3 uplinks from the routed access layer, for example) to have a ratio of 3.6:1.
Now, that seems outlandish on its face. The client I am working with has about 300 servers uplinked at 2 Gbps and does not experience any application latency at all. How and why would I recommend that he upgrade to 80 Gbps??
I do understand that this oversubscription methodology is not an exact science. It depends on the types of servers and the volume of application traffic associated with them. But, still, how do I present a model to the client that suggests an added cost of over $44,000 just to implement the uplinks??
Does anyone here have real-world experience in designing a data center from scratch? What considerations were made and what conclusions were drawn regarding oversubscription?
Thank you all ahead of time for your input.
I think Cisco's SRND assume worst case scenario (or highest BW usage imaginable). If they said 2Gbps is fine, then everyone would do that and high BW servers farms would over subscribe and Cisco would say add more uplinks and they customer would say, but the SRND says 2 Gbps is fine...you get the idea. Every network is different and it's impossible for Cisco to give a solid answer on this and other technologies. Best bet is to monitor your current uplinks and design off of that. For example we run 10/100/1000 at top-of-rack, 3Gbps to the distro, then dual 3Gbps to the core. Not even close to 3:1 and I rarely see 2% utilization. It's all about your network and it's applications.
Selecting the correct subscription ratios is very difficult. The easiest calculation would be to look at the total traffic and provide bandwidth to cover that. For instance, you determine the average bandwidth generated by your 288 servers is only 500 Mbps, i.e. total bits send over a busy hour divided by the number of seconds within an hour. So, a 1 Gbps uplinks looks fine (usage stats will confirm so too).
What such a simple average calculation doesn't show, is how the total traffic was transmitted over time. If transmission rates are highly variable, you can have bandwidth spikes that cause variable latencies. Bulk data transfers don't often care, but interactive and especially real-time traffic often does.
To minimize variable latencies, you need additional bandwidth. (An alternative, is using QoS prioritization if not all traffic needs to be treated alike.) So, where above, 1 Gbps uplinks would handle the total traffic, you might need 10 Gbps to avoid highly variable latencies.
Recommended subscription ratios are simple but may, or may not, be appropriate for your usage. Deeper analysis of what criteria you're trying to satisfy should allow presenting a model that a client will accept.
> What i gather from your experiences is that the Cisco methodology is just 'pie in the sky.'
A bit harsh, I would call it a 'Guideline'.
I agree with Edison, not really so much "pie in the sky", but more I think from Cisco playing (very) safe. They often seem to err on the (far) side of caution, well that and (cough) they do also sell hardware.
An interesting aspect of how much bandwidth you need, is the incorporation of Corvil Bandwidth technology analysis in the later router IOS images.
All kidding aside, call it whatever you want, the fact of the matter is that Cisco's recommendations for oversubscription are pretty arbitrary. Where do they get this 2:1 or 3:1 target?
It really depends on the type of applications being deployed, how chatty they are, how many users access them, etc. And that is exactly what Cisco does NOT discuss: The fact that the server farm environment should be 'sniffed' over a period of time to determine the traffic utilization and requirements for the uplinks and the backbone -- and then going from there.
So, you can cough a little louder when you mention that Cisco is selling hardware... :-)
Far be it from me to defend Cisco :) but you have to try and put yourself in their shoes. As with any company, if you asked what sort of oversubscription you should be looking it they could answer one of 2 ways
1) We don't really have a clue. Up to you to find out.
2) Well, although we would always recommend you to analyse your own traffic our recommendations from years of dealing with ISP's/Enterprises/SMB's is that an average oversubscription rate that works is ...
I know i would be more inclined to use the vendor who told me 2 rather than 1 and i think Cisco's recommendations do come from experience rather than just picked out of the air.
But yes, they will err on the side of caution and they do sell hardware :).
It is also very difficult to specify how much bandwidth is neeeded even for the same applications - Oracle is a good example. A badly written query could generate far more network traffic than a well written one and with the best will in the world i'm not sure any networking company can be expected to account for that.
Oh yeah???? Well I say "Down with Cisco! Down with the establishment! Down with those tyrannical Cisco engineers who want us to spend 44,000 bucks for nothing!!!"
Yes, Im kidding! :-D