Solved: oversubscribed WAN needs shaping and queueing best practices

TYLER WEST · ‎04-07-2009

I have inherited an oversubscribed WAN that needs help. An HQ site has a DS3 with about 90 remote sites each with T1s. Each site taps an SP MPLS PE over HDLC but we are not running MPLS ourselves or running VPNs through the SP. It only provides the peer-to-peer behavior for the voice.

Initially implemented CBTS at 1536k per site on DS3 outbound. Yes, 90 classes defining each site and shape average for the class. The shaping policy calls another policy for CBWFQ/LLQ. No traffic shaping outbound on the remote routers (yet). Shape average prevents me from congesting the remote links in the SP network but will still permit the network to send more traffic to the DS3 outbound than it can handle. Even with the CBWFQ/LLQ parameters, voice and critical apps have suffered during a couple of events where updates were pushed to multiple sites simultaneously and taxed the DS3. (I don't control the sysadmins and their policies so I have to try to protect my network from them.)

Could use shape peak but fear the effects on voice. It also seriously trims down the BW available to use for CBWFQ/LLQ.

The docs and SRNDs do a great job of dealing with shaping on oversubscribed networks and CBWFQ/LLQ on converged networks but fail miserably at bringing to two situations together. Looking for recommendations and best practices for such a scenario to bring some immediate stability.

Tyler West, CCNP

Joseph W. Doherty · ‎04-08-2009

Simple solution to bandwidth management to 3845, assuming you're routing, weigh paths such that one is active and other is standby (outbound). Then you can both insure only 45 Mbps sent to 3845 and still have fail-over redundancy.

Whether to shape/queue directly on 6500 depends on what it supports. I know 6500s police and also know they often have multiple queues per port, but unsure typical LAN ports support shaping. Reading such as:

Q. Is traffic shaping supported on the Catalyst 6500 (Cat6K) Switch?

A. Traffic shaping is only supported on certain WAN modules for the Catalyst 6500/7600 Series, such as the Optical Services Modules (OSMs) and FlexWAN modules. Refer to Cisco 7600 Series Router Module Configuration Notes for further information.

Would lead me to believe, they might not.

If they don't, a pair of 2960-8TC-Ls, one per Ethernet link, along with active/standby link usage, might provide the aggregate DS3 queue management.

Without an upstream shaper, you would assign LLQ bandwidth to support your expected aggregate. You would shape for each at its bandwidth, less what you want to set aside for voice on that spoke. Going too low at the parent level LLQ bandwidth is bad; too high not a problem. Going too low at spoke level, also bad; too high will reduce available bandwidth for other apps, even when not being used by voice.

Unclear why you're shaping (full?) T1s at 360 Kbps. Why are you not shaping for 1.5 Mbps?

Shape peaking is an issue, since it, along with larger Tcs, allows for possible large packet bursts. If there's FIFO queuing, such bursts can cause highly variable performance for time sensitive traffic, such as VoIP. (In fact, when dealing with voice, it's often a good idea to reduce an interface's TX [FIFO] buffer size, for the same reason.)

To summarize what I'm suggesting, on a 2960 use "srr-queue bandwidth limit (40 to 45)" on 100 Mbps port. Configure "priority-queue out" and place real-time traffic, such as voice, into Q1. Configure something like "srr-queue bandwidth share 100 1 75 224". Place critical apps in Q4, BE in Q3, background in Q1. This insures, at the DS3 interface level, traffic is prioritized as we want.

On the 3845, continue to do what your doing. Something like (with HQF):

policy-map spoke

class real-time

priority percent 30

class critical

bandwidth percent 89

fair-queue

class background

bandwidth percent 1

fair-queue

class class-default

bandwidth percent 10

fair-queue

policy-map ds3

class spoke1

shape average 15000000

service-policy spoke

.

View solution in original post

Joseph W. Doherty · ‎04-08-2009

Unsure any Cisco platform will logically accomplish what you desire on a single device using just CBWFQ QoS. (There is some support, I think, for interfaces that support PVCs, where there's QoS prioritization both at the physical interface and the PVC, and congestion slows the PVCs.)

One solution, if there isn't a way to configure your single device, would be to use two devices in series. The first would shape traffic at 45 Mbps, and prioritize using CBWFQ/LLQ. The second would shape, also using CBWFQ/LLQ, for each downstream spoke (as being done now).

If the existing device only has one Ethernet (LAN) and DS3 (WAN), you could even place a small 8 port 3560 before it on the LAN. The 3560 can idle an Ethernet port providing "shaping" and supports 4 queues, one which can be like LLQ.

PS:

Another option on a single device, if it supports VRF, and you have spare Ethernet ports, it might be possible to transit the device twice and also accomplish what's described, above.

TYLER WEST · ‎04-08-2009

That's one of the snags I'm running into. If I could queue-shape-queue I think I could solve it. But I can only make a queing policy a child of a shaping policy and not the other way around. I think I might be able to queue and shape in the parent policy and then possibly queue also in the child policy (HQF) but I know that can't be done with LLQ in calling class of the parent policy. I have already tried that.

For more info, the platform in question is a 3845 with redundant Ethernet uplinks to two 6506s. I am running a recent 12.4T train so HQF is an option if it will give me any benefit.

I haven't gotten back on it this morning but I think I'm going to go back and try a parent policy that DOES NOT separate the voice out by site. One class contains all voice and does LLQ for all sites. The remaining classes are the 90 already defined that do "shape peak" so I can control the congestion on the DS3. Then those classes call the child CBWFQ that does NOT use LLQ since it is already taken care of. I'm crossing my fingers on that one.

I know I should also shape out the 6500s outbound to the 3845 so at least I limit the aggregate ingress traffic to the 3845 to 90Mbps instead of a potential 2Gbps. I definitely need to get more creative with my 6500s and I think that will probably help some.

I will entertain any other suggestions/experiences.

Thanks,

Tyler

Joseph W. Doherty · ‎04-08-2009

I had considered HQF, but even with its enhancements, it too might not directly support an ideal configuration. (In fact, it changing fair-queue to just FQ not WFQ, might work against you if you pursued some alternative methods.)

Your idea of breaking out voice into LLQ at the parent level does work although with some issues. You need to shape each spoke slower than its link capacity to insure there's sufficient bandwidth for LLQ that's been "set aside" on the spoke link. Also, although this should guarantee voice performance, it doesn't guarantee performance for other critical applications (from overall congestion on the DS3).

When shaping to downstream physical bandwidth, peak shaping works against you. Average shaping, with a small(er) Tc would more closely mimic the slower downstream physical interface.

As to shaping on the 6500s, since there's two, it makes a new issue how do you shape for the DS3 across both 6500s (unless you're running VSS)? With two logical devices, you might shape each to half the DS3, but like breaking voice into a separate LLQ parent class, likely you'll not obtain full DS3 utilization.

To get aggregate shaping, you'll want all traffic to transit one logical device where you can shape on one logical port. Again, an "inexpensive" switch might do. (In my prior post, I recommended a small 3560, but believe a small 2960 might work too.)

TYLER WEST · ‎04-08-2009

The breakout of LLQ at the parent level was a little brainstorm I had late last night. The configuration accepts it and I can apply it to the interface but it is still untested at this point. I hope to start on that soon. I had considered that the shaping in the site-defined classes would have to be considerably less ((DS3 - LLQ) / #sites). If I shape average, though, I might as well not even have T1s out there. I would be shaping at ~360Kbps. With shape peak I could at least have opportunities to take advantage of the remote 1536Kbps when bandwidth is available (shape peak 360000 3600 11760 ??). My voice should definitely be protected from the ill effects of shape peak. The major concern I have with shape peak is I have no ECN method for adaptation since I'm not on Frame Relay. Can you elaborate a little more on how and why peak shaping downstream is an issue? I'm still fuzzy on if or how it adapts when there is no ECN.

From the standpoint of shaping on the 6500s I believe I would just take the approach of shaping each at 45Mbps. Yes that would allow an aggregate of 90Mbps into the router but it is better than a potential aggregate of 2Gbps and the router having to buffer that. Where I really need to be more creative on the 6500 is combining the shaping with better queueing than what is in place currently.

A single switch inserted would be simpler but would defeat the purpose of the redundancy. No VSS yet and not likely to happen in this environment ($$$).

Once again, thanks. You are most definitely helping me think outside of the box.

Joseph W. Doherty · ‎04-08-2009

Simple solution to bandwidth management to 3845, assuming you're routing, weigh paths such that one is active and other is standby (outbound). Then you can both insure only 45 Mbps sent to 3845 and still have fail-over redundancy.

Whether to shape/queue directly on 6500 depends on what it supports. I know 6500s police and also know they often have multiple queues per port, but unsure typical LAN ports support shaping. Reading such as:

Q. Is traffic shaping supported on the Catalyst 6500 (Cat6K) Switch?

A. Traffic shaping is only supported on certain WAN modules for the Catalyst 6500/7600 Series, such as the Optical Services Modules (OSMs) and FlexWAN modules. Refer to Cisco 7600 Series Router Module Configuration Notes for further information.

Would lead me to believe, they might not.

If they don't, a pair of 2960-8TC-Ls, one per Ethernet link, along with active/standby link usage, might provide the aggregate DS3 queue management.

Without an upstream shaper, you would assign LLQ bandwidth to support your expected aggregate. You would shape for each at its bandwidth, less what you want to set aside for voice on that spoke. Going too low at the parent level LLQ bandwidth is bad; too high not a problem. Going too low at spoke level, also bad; too high will reduce available bandwidth for other apps, even when not being used by voice.

Unclear why you're shaping (full?) T1s at 360 Kbps. Why are you not shaping for 1.5 Mbps?

Shape peaking is an issue, since it, along with larger Tcs, allows for possible large packet bursts. If there's FIFO queuing, such bursts can cause highly variable performance for time sensitive traffic, such as VoIP. (In fact, when dealing with voice, it's often a good idea to reduce an interface's TX [FIFO] buffer size, for the same reason.)

To summarize what I'm suggesting, on a 2960 use "srr-queue bandwidth limit (40 to 45)" on 100 Mbps port. Configure "priority-queue out" and place real-time traffic, such as voice, into Q1. Configure something like "srr-queue bandwidth share 100 1 75 224". Place critical apps in Q4, BE in Q3, background in Q1. This insures, at the DS3 interface level, traffic is prioritized as we want.

On the 3845, continue to do what your doing. Something like (with HQF):

policy-map spoke

class real-time

priority percent 30

class critical

bandwidth percent 89

fair-queue

class background

bandwidth percent 1

fair-queue

class class-default

bandwidth percent 10

fair-queue

policy-map ds3

class spoke1

shape average 15000000

service-policy spoke

.

Joseph W. Doherty · ‎04-08-2009

Note:

In sample policy, for non-LLQ classes, use "bandwidth remaining percent".

TYLER WEST · ‎04-08-2009

Now I think I'm beginning to piece this together. So correct on the 6500. I meant policing and not shaping. Basically need policing and proper use of the port queues on egress of the 6500. I didn't think about using weights to prefer one link of the 3845 over the other. That would solve the problem of aggregate bandwidth exceeding 45Mbps while making sure that I can still push 45Mbps when a link has failed. The reason for the lower shaping rate on the DS3 for the per-site classes was because the sum of all of the sites would exceed the total for the DS3 by more the 3x. But since I would be depending on my 6500 to police that I no longer have to worry about that and can safely shape the individual sites to a full T1 knowing that the DS3 can no longer be overwhelmed.

That is a huge help.

Joseph W. Doherty · ‎04-08-2009

You don't want to use policing on the 6500s, unless you again pursue the logic of bandwidth set aside. I.e. police all non-voice traffic to be less than 45 Mbps. No major advantage doing it on the 6500 vs. the 3845; and on the 3845, we can shape or police.

Yes, understood the combination of all your spokes can exceed the bandwidth of the DS3, but that's why we want some kind of high level interface policy to both prioritize and put "back pressure" on the spoke policies as necessary.

I.e. we want to both assure voice goes first, critical goes next, BE goes next and then background goes last, both at the DS3 interface and for each downstream spoke. Further, ideally, we don't want to impede any traffic we don't need too.

PS:

BTW, if you insure the combined sum of spoked shaping values doesn't exceed the DS3, then our QoS will work fine although at much lost to individual spoke bandwidth. When you try to "burst" into that "lost" spoke bandwidth, you lose the DS3 bandwidth guarantee we need for properly working QoS at the DS3 level.

The suggested 8 port FastEthernet 2960 is likely the most inexpensive way to have decent QoS for 4 classes. If you need more than 4 classes, you would need "smarter" devices. For 45 Mbps, Ethernet to Ethernet, two 3825 would do the job, as might(?) two 2851s.