Re: Redundancy Plan

poulid · ‎12-04-2006

Hello. I'm sure this topic has been beaten to death, but one more kick won't hurt. We have a data center with approximatley 140 servers. Most servers connect into 2950T switches, which connect to the core switch (4507R) via gig ethernet uplink. The core 4507R has a 48 port gig ethernet line card, and routing is done using interVLAN routing, and OSPF to the WAN. Also, approx 35 servers are directly connected to the core switch. This was done because of the gig capability of the line card, so the servers with the most data could be backed up faster. We have approx 10 subnets (not including WAN), 3 of which are user subnets and 6 that are for servers.

Our manager has asked us to investigate redundancy for the 4507R, so I have recommended implementing a 4506 with the same line card config. Also, I have proposed that we remove the 2950's that uplink the servers, and replace them with a stack of 3750G-24 switches. This way we could VLAN the stack into the 6 existing subnets, and spread the VLAN's out across the different switches in the stack. This way we don't have to worry that if we lose one switch the entire subnet will go down. I have also recommended removing the servers that are plugged directly into the core, and plugging them into the stack of 3750's.

Also, our manager would like to see the critical servers dual-homed, and I don't like the idea of plugging one NIC from serverX into the 4507R, and the other into the 4506. I've got some pics, which looks like a better design? By rule, should servers be plugged directly into the core switch, or should they be plugged into the dist/access layer?

kamal-learn · ‎12-04-2006

Hi

if we take a look at the avvid architecture , specialy in the infrastructure

module of the AVVID and we zoom further we will see the entreprise composite model,

let take only the entreprise campus module from the entreprise composite model, we find the

server-farm module which can be organised using three layers access/distribution/core the

core here will be the core of your network however access/ distribution are owned by

the server-farm moduule,you organize that with hardware redundancy and dual NIC cards on the servers one

this done connect the distribution switches of the server-farm directly to the core of your network.

thanks

poulid · ‎12-05-2006

OK, thanks. So in the design with the stack of 3750's, they become the distribution layer, and the wiring closets are the access layer, and the 4507 is still the core.

Am I right in assuming that servers should be plugged into the distribution layer 3750's, and not plugged directly into the core switch?

rafaelgarcia · ‎12-05-2006

Hi,

Yes, you are correct about that. The Network Design Best Practices document says that you should configure the Core Layer. This layer contains all the high end routers,it also requires a highly redundant configuration since this is the heart of the network.Core down means no business.

Distribution Layer is where you connect the servers and low end routers. As the name says, this layer holds the application servers in your organization. Then you have the Access Layer which is the end user environment where the workstations are connected.

I hope it helps.

kamal-learn · ‎12-05-2006

Hi

yes you are right , for a good design the core must be free of any business traffic and any policy its functionality is speed switching so pluggin servers in the core is not good to go with.

i would like to add something here:

first the stack does not provide redundancy you still have an SPF a single point of failure, and the stack just used to provide more port density!

the model you provided in both schems is not a pure hierarchical model i mean a access/distribution/core but you are using what we call a collapsed backbone in that the core and the distribution are merged whin on

device one layer here your 4507/6.this model is good from small to medium enterprise size.

for your servers if you use just one card you still have a single point of failure

so if you use two nic card each connected to a different device you have full redandancy

if your budget allows go with clustering of your servers the best way , this is required if you want to achieve five 9 -->99.999 availability.

thanks

poulid · ‎12-06-2006

Thanks for all of the great info.

If we dual-home the servers into different members of the stack of 3750's, and then create ether-channels using ports from different members of the stack, would this not eliminate the SPF?

Also, my manager wants to dual-home servers and plug one NIC into the 4507, and the other NIC into the 4506. I'm not sure if this would even work, so I'm definetly trying to talk him out of this idea. This is why I'm proposing the 3750's.

Another thing the 3750's should provide is jumbo frames, which might help out with the backup times we are struggling with.

kamal-learn · ‎12-06-2006

hi thanks for the comments,

let take one server and you treat the other the same way, if your server is connected to one 3750 , the spf is still here if the 3750 failed the server can not communicate with the rest of the network, if you connect your server to two different 3670 here you eliminate the SPF, but this needs two nic cards in your sevrer or a transeiver.some OS support 2 nic card and presents both cards as a single interface to the OS (especialy unix) and even you can use them for loadbalancing the traffic of your server.

for etherchannel you can lie up to 8 ports, if you have more then one uplink

toward your 4506/7 spanning tree will block most of them leaving just one in the forwarding state, but if you use therchannel, SPT will treat the bundle as one link , in that case you wil have a fault tolerence of the link if one failed the others link in the bundle will handle the traffic transparently

,also you will have loadbalancing there is a hash that determines which port in the bundle will be used may be you will use just one and they others are there not helping at all, so you have to tune

the config of the loadbalancing of the bundle (use SRC-DSN-mac,SRC-DST-IP....) to make all or most of them working.

for the solution of dual home your server in both 4506/7 is good to go , you have a full redundancy for the device and the link ( between the server and the device ) so there is NO SPF.

for the struggling time of the backup is a reality, to overcome it you need budget

(SAN with fiber channel, cisco has solution on that field)

HTH

do rate if it does clarify

poulid · ‎12-06-2006

So you are saying it's OK to plug the servers directly into the core switch? When teaming the NIC's (HP Proliants), do the corresponding network ports not have to be setup as an etherchannel to accomodate the two NIC's? If so, I'm unclear how this could be done on two separate devices (4507 and 4506).

poulid · ‎12-07-2006

Anybody else have an opinion on the two designs? Also, does anyone else know if it is possible to dual-home a server into two different L3 devices?

mark.hodge · ‎12-08-2006

Server dual-homing capabilities depent upon the operating system / TCP ip stack. You mentioned Compaq / HP teaming, so I assume you are talking Windows. If you connect a server to two seperate devices using HP teaming you can get resliance but not increased throughput. Only one of the interfaces will be able to recieve traffic, whilst both may transmit.

The exception to this rule is the 3750 stack, this is because, whilst they are seperate physical devices, they act as a single logical unit.

poulid · ‎12-08-2006

Thanks Mark. Which design would you go with, the one where the server farm module consists of a stack of 3750's, or the design where the servers that need to be dual-homed are plugged into the 4507 and 4506, and rest of the servers are plugged into the 2950's which uplink into the cores.

I like the idea of the stack for redundancy in the server farm module, and if a server needs to be dual-homed or not, it still plugs into the stack. It seems much cleaner.

poulid · ‎01-18-2007

We're still kicking this idea around, anybody else have an opinion on the two designs?

amohabir1 · ‎02-01-2007

You might want to think of redundancy from the perspective of the machine. If you take a user desktop and follow it all the way up to its physical path you should find that by tracing out the course of its travel to the endpoint what the possible points of failures are.

So if you take what your boss wants..

Step 1.

If you are running HSRP for lets say VLAN 2 (user workstations) and the primary path is the cat4507 then the user desktop will use that link to forward its request. If that link is down it will then forward it to the backup L3 switch (cat4506.) That redundancy piece is covered.

Step2. Now the packet needs to get to the db server. With your diagram it seems as if it should make it there with no problem, but what happens if that link to the server is down. It has no way of getting to that server. The packet will never know where the server is because your cat4507 told it that the path to the server is here. You should have a trunk or some sort of etherchannel between the catalyst to achieve the full redundancy.

You question is hard to answer because there are many reasons why you should or should not do something. What is your boss's budget? How large is the environment. Do you need cisco's 3 layer model or does the collapsed backbone fit your needs now and a year from now. We network engineers tend to think that by building the 3 layer model in the beginning it will make our lives much easier later on but at the same time it might not be feasible.

You have to remember though that you should be looking at the logical aspect of the design. Think about certain disaster type scenarios that might prevent the servers and host from talking to each other and then figure out what workarounds there are to prevent them. Keeping your design simple will help you achieve more.

Your design is good but it is missing the connection between switches. Your boss's design can work to but that also needs the links between the two switches.

poulid · ‎02-15-2007

Thanks for the insight, it is very much appreciated.

When you say my design is good, but it is missing the connection between switches, what connection are you refering to?

If each riser closet in the building had two uplinks coming back to the server room, from two different switches, and each uplinked into each 4500, redundancy from the wiring closet is achieved.

The stack of 3750's allows us to dual-home servers (potentially using ether-channel), and make sure each member of the stack uplinks into each 4500, giving us redundancy at the server farm level as well.

Thoughts?

branfarm1 · ‎02-23-2007

This is a very interesting discussion and I'm glad someone else out there is struggling with making "the right choice."

In my organization we have very high availability requirements, and we have tried to engineer redundancy into every aspect of the operation. We have redundant applications on redundant hardware, connected to the network via redundant links, those links connected to redundant switches, all backed up entirely with a redundant datacenter.

In our switching design we have each server dual-homed into our "collapsed backbone" which consists of two 4507R's with a 2 member Etherchannel between the switches. This has worked fine for us, although I must admit we have yet to experience a failure of a major component (switch, linecard, fiber transciever, etc). The only issue I've ever been able to find are the incessant log messages on the switches reporting hosts flapping between the etherchannel and the local port (although I have wondered if that might be causing inefficient switching and therefore a performance hit -- any ideas?).

So, for what it's worth, we plug directly into our core switches, dual-homed, and haven't ever had any infrastructure related outages. (Knock on wood!!).