Solved: Re: DiffServ Compliant WRED: Threshold Settings

mkato · ‎04-27-2008

Reference: http://www.cisco.com/en/US/partner/docs/ios/qos/command/reference/qos_q1.html#wp1013427

All platforms except the VIP-enabled Cisco 7500 series router and the Catalyst 6000 have default min threshold values shown in the table referenced above and default max threshold of 40.

40 is a long way from some values I've seen in Catalyst 6500's, like 512, and the max configurable value is 4096.

Questions:

1) Does anyone have any guidance, references, tips, real-world experience with tuning the max threshold value?

2) Does max threshold _need_ to be tuned in the real world?

3) Is the maximum value for max threshold platform dependent?

Any feedback or insight would be appreciated...

Joseph W. Doherty · ‎04-29-2008

"Got it. There is a class of traffic I wouldn't mind starving out--scavenger class. Hmmm... "

Perhaps, but if there is traffic that can be totally starved, why not block it with an ACL.

Personally, I prefer to configure scavenger with the very minimal bandwidth possible, yet since I'm not blocking it, it can have the whole circuit's bandwidth, if available. (You could still rate limit it or shape it, but if the bandwidth is available, and if there's no additional usage charge, why not use it?)

Also, gee I'm really, really sorry your backup traffic is only getting 8 Kbps during peak traffic loads during the day. Have you considered scheduling that for overnight?

[edit]

PS:

Also if scavenger shares a queue with other traffic, it will delay that other traffic even while your RED is dropping scavenger excess packets.

View solution in original post

Joseph W. Doherty · ‎04-28-2008

After reading what Cisco had to say about WRED, and after also reading the research literature on WRED and similar techniques, I've tried it in production environments over many years; mainly T1/E1 WAN links and occasionally with T3 or OC3 WAN links.

My experience has been the normal Cisco defaults often don't seem to work very well to improve "goodput" or avoid global synchronous tail drop, especially on the T1/E1 WAN links. Also one size doesn't fit all, parameter settings are very dependent on what you're trying to accomplish using it.

Some very general tips:

Works best with many concurrent flows with similar traffic characteristics.

Early packet discards are often ineffective with most non-TCP traffic.

The common Cisco default min being half of max is also usually too small. Max 3 or 4 times min seems to work better.

Min can be set based on what you want the average queue delay to be (average packet size times number of packets divided by bandwidth).

Should see few WRED tail (max exceeded) drops. Early (between min and max) drops usually shouldn't exceed 1%.

If you had a specific usage in mind, I might be able to make a more specific suggestion.

PS:

BTW: The default max of 40 might not be as small as you think, remember this represents an average queue size of 40 packets, not an instantaneous queue size.

mkato · ‎04-28-2008

Thanks for taking the time to respond. Point taken on the queue length vs. transit delay--for non-delay sensitive traffic I seem to recall Cisco recommending starting with 250ms for 250 byte packets at the allocated bandwidth for the class. Does that sound like a decent starting point?

I don't really have a specific use in mind, other than I was just mulling over how one might actually "condition" multiple types of traffic for admittance to a given class of traffic. For example, say for a given queue size in a given class, I wanted to start dropping (1/10) high-drop traffic at 50% queue utilization; at 75% utilization drop all (1/1) high-drop and begin dropping (1/10) low-drop traffic; by 95%, drop (1/1) all traffic except for mission-critical traffic (1/10 until tail-drop at 100%).

What's kind of confusing is the docs say at max you start dropping according to the mark denominator, while above max, all packets for that dscp value are dropped. Does it just fall off a cliff like that or does it ramp?

Joseph W. Doherty · ‎04-29-2008

For only non-delay sensitive traffic, like FTP, 250 ms is okay but such traffic, if being sourced by Ethernet and not having to deal with a reduced MTU, average packet size tends to be close to 1500. It's only when you have a common mix of traffic that the size tends toward 250 (same as about IMIX), but in this situation, where you might also have transactional traffic, you might not want to go beyond 50 ms.

You can configure different min/max within the same class as you propose, and I know of some MPLS vendors that do so, but if you totally drop lower priority traffic before higher at some point in your percentage of queue utilization, you can starve that traffic of bandwidth, much like what can happen when using PQ.

If you do mix traffic types, probably better they all share the same max value (which often also seems to be the Cisco default), but either have lower min for higher drop priority (also seems the Cisco default) and/or adjust the drop rate percentage higher.

Better for different traffic types, if possible, is to have such traffic in separate classes/queues, where you control their bandwidth allocations when there's congestion, and perhaps use WRED within each class.

You can still use the technique you describe when dealing with the same traffic that's either within or not within its contracted bandwidth (usually marked as such with a policer). Or, just skew the average drop as described, above, for different traffic types.

The way max works, at exactly max you get the specified drop percentage, the Cisco default being 10%. At max +1, you get 100% drop, just like a FIFO queue at +1. At min -1 you get no drops. From min through max -1, you can calculate your drop percentage for each queue size value by graphing a line starting a 0% at min -1 and ending at the selected drop % at max. (Again, keep in mind, current queue depth used for WRED is a moving average, so you might see 100 packets while the average is only 10 and no drops, or conversely an average max with max's drop percentage applied to every new packet yet no packets currently in the queue.)

[edit]

PS:

When working with transit delays queue sizing calculation, also keep in mind because of WRED's queue processing is based on moving averages, calculating a queue depth for some value of # ms also provides an average. Actual max latency can be very variable and high. I.e. you might need to use an average # ms delay much smaller to keep worst case delay within acceptable bounds. For example, perhaps 50 to 100 ms for bulk data transfers, and 10 or 20 ms for transactional.

mkato · ‎04-29-2008

"If you do mix traffic types, probably better they all share the same max value (which often also seems to be the Cisco default), but either have lower min for higher drop priority (also seems the Cisco default) and/or adjust the drop rate percentage higher."

Got it. There is a class of traffic I wouldn't mind starving out--scavenger class. Hmmm...

Thanks again for all the insight.

Joseph W. Doherty · ‎04-29-2008

"Got it. There is a class of traffic I wouldn't mind starving out--scavenger class. Hmmm... "

Perhaps, but if there is traffic that can be totally starved, why not block it with an ACL.

Personally, I prefer to configure scavenger with the very minimal bandwidth possible, yet since I'm not blocking it, it can have the whole circuit's bandwidth, if available. (You could still rate limit it or shape it, but if the bandwidth is available, and if there's no additional usage charge, why not use it?)

Also, gee I'm really, really sorry your backup traffic is only getting 8 Kbps during peak traffic loads during the day. Have you considered scheduling that for overnight?

[edit]

PS:

Also if scavenger shares a queue with other traffic, it will delay that other traffic even while your RED is dropping scavenger excess packets.