STP & HSRP for VLAN Load Balancing across 6509E

chestes · ‎04-09-2014

Hi LAN Switching and Routing Experts,

Some would say that HSRP load balancing is a well-known, developed and deployed best practice within the industry for load balancing of L2 traffic per vlan. A friend and former colleague explained that they do not believe in this practice and they funnel all VLAN traffic to a single core switch, rather than intuitively distributing it across both.

The argument was that load balancing is more difficult to troubleshoot.

In my opinion, this tells me one of two things.

The engineer doesn’t fully only understand how STP functions and/or how to control / optimize Per VLAN traffic using STP/HSRP

OR

The engineer wants to utilize the simple method or approach of configuration to make his / or her operational life easier.

In my opinion, that denotes an operational and short-term perspective. Rather than a long-term way to optimize traffic across the environment.

What do you think?
Is it always better to load balance across core switches?

What are the advantages or disadvantages of either method. Of course, this is a mute point if you have a VSS, but I would like to understand what Cisco recommends and what other senior engineers deploy and why?

Best Regards,

Christian

Christian J. Estes, cwne #85, cciew #42615

Jon Marshall · ‎04-09-2014

Christian

There are a number of different answers to this depending on the exact topology and what interconnect ie. L2 or L3 is used between the core/distribution switches.

But from your description i am assuming -

1) the interconnect is a L2 (etherchannel) trunk allowing all access layer vlans

2) the same vlan can span multiple access layer switches

this means per vlan from an access switch one link has to block (assuming a variant of PVST) so to utilise both uplinks you would need to manually do the load balancing across both uplinks ie.

you make one core/distro switch the STP root and HSRP active for all odd vlans and the other switch for all even vlans.

This means you are at least using both links although you are still only getting one uplink's bandwidth for each vlan.

I would say that is pretty common and a fairly standard design. The main benefits i can see are -

1) you utilise both uplinks from the access layer switch although per vlan you are only using one uplink

2) you spread the load for traffic from the access layer between your core/distro pair.

I do remember reading a long time, but can't remember where, a document or article saying even if you do load balance per vlan you should still provision the uplinks to handle the full load so you could argue if you did that in terms of link usage it wouldn't make much difference whether you sent all traffic to one core/distro or both.

That said i don't necessarily think you always have to provision your uplinks so one can handle all the load. It really depends on the design requirements, budget etc. and you still need to factor in that using both uplinks spreads the load for client traffic between the core/disto switches.

The above assumes the paths from the core/distro switches to get to other network device eg. server switches, routers, firewalls etc. are equal on both switches.

In terms of troubleshooting i have never found it to make it any harder. If you follow a consistent setup eg odd and even vlans you can immediately identify which core/distro switch you need to look at first to start looking for issues with a particular vlan and if it is all vlans then it's pretty easy to identify.

So as long as you are prepared to accept that if an uplink fails it could affect all users on that access switch in terms of slower response times i think the main advantage is you don't have one uplink completely unused for all traffic.

HSRP is the right FHRP to use for this setup. GLBP is not, in my opinion, because per vlan it can only use one of the uplinks but the active core/distro switch for a particular client might be via the blocked uplink so traffic has to traverse the interconnect.

If you don't need to span vlans though ie. each vlan is confined to one access layer switch then you can use a L3 interconnect between the core/distro switches so both uplinks are forwarding and here is where GLBP really is better.

And then finally of course if you can limit each vlan to a specific switch you have the option of a L3 access layer although with the advent of VSS/MEC etc., for me at least, the advantage of using L3 in the access layer is not as convincing as it was.

Jon

Bilal Nawaz · ‎04-09-2014

Hello, I disagree that its more difficult to troubleshoot, but i do agree if its in a complex networking environment. Jon raised some very good points, and apologies if I reiterate.

I have seen some of most largest organisations implement a simple basic one way flow whereby all STP paths and routed traffic flows via one side. Yes your not making use of the secondary path so what? Their point is to keep it simple and basic. Why? Since the organisations have such a large network as it is, its probably best to keep it simple, you will have less chance of getting things wrong. And this is the same with small to medium size networks, where you just take a simple approach of not going complicated and making life more simple. In some cases, they cannot afford a single outage, and if this rule is stuck with, nothing much can go wrong. Now you could argue, nothing could go wrong with load-balancing too. But at least we ensure that all traffic should flow down one side, rather than being in doubt if its the other side. (not good in pressure situations)

This counts for networks that can cater for the bandwidth.

However, the other argument is, we havent got enough bandwidth, lets split the load. To some, bandwidth not being used is money down the drain. And also your point about optimising traffic making use of the link. For good practice, Jon seems to have outlined well what could potentially mitigate risks for load-balancing traffic this way.

Please rate useful posts & remember to mark any solved questions as answered. Thank you.