Class based WRED on an ATM vc in a 7600? I'm stuck

Answered Question
Sep 23rd, 2010
User Badges:

Good afternoon everyone,


I've been working on creating some new qos policies that we apply to ATM PVCs that are built up on ATM SPAs in a SIP-200 on the 7600 series routing platform.  I'm running into a bit of a brick wall when looking at what some of the engineers before me created and what I need to do to create the new ones.


The quick and dirty looks something like this.  For atm PVCs that are built on a SPA in the 7600 they built different policy maps for different speed PVCs.  We have need to build a new policy for a PVC that is going to exceed what was built up already.  The only thing in the policy-map that could be tied to the speed specified in the policy map name is the min max thresholds used for random detect.  I cannot figure out how these values were determined and have been unsuccessful in locating any documentation that might help.  I'm not ruling out that this is not the correct way to do this but without knowing why it was done this way I'm reluctant to toss it aside and build it another way.  Can anyone shed any light on why someone would build it this way and maybe offer some alternatives?  I've included an example of what I'm talking about.


policy-map atmspa-qos-9216k
  class Voip
   police cir percent 50    conform-action transmit     exceed-action drop     violate-action drop
    priority
  class Voip-Signaling
    bandwidth percent 5
  class Network-Control
    bandwidth percent 5
  class Critical-Data
    bandwidth percent 20
    random-detect dscp-based aggregate
    random-detect dscp values 18 minimum-thresh 231 maximum-thresh 356 mark-prob 10
    random-detect dscp values 20 minimum-thresh 89 maximum-thresh 356 mark-prob 10
    random-detect dscp values 22 minimum-thresh 35 maximum-thresh 356 mark-prob 10

  class Bulk-Data
    bandwidth percent 4
    random-detect dscp-based aggregate
   random-detect dscp values 10 minimum-thresh 46 maximum-thresh 71 mark-prob 10
    random-detect dscp values 12 minimum-thresh 18 maximum-thresh 71 mark-prob 10
    random-detect dscp values 14 minimum-thresh 7 maximum-thresh 71 mark-prob 10

  class Scavenger
    bandwidth percent 1
  class class-default
    bandwidth percent 15

Correct Answer by marikakis about 6 years 7 months ago

Hi Dave,


Are you still stuck with this one?  I know you tend to look for definitive answers and I can't provide you one on this issue. Anyway, I decided to post at least a couple of thoughts. At a first glance the WRED numbers seem completely random, but they are not. Look at this config for example:


class Critical-Data
    bandwidth percent 20
    random-detect dscp-based aggregate
    random-detect dscp values 18 minimum-thresh 231 maximum-thresh 356 mark-prob 10
    random-detect dscp values 20 minimum-thresh 89 maximum-thresh 356 mark-prob 10
    random-detect dscp values 22 minimum-thresh 35 maximum-thresh 356 mark-prob 10


With some reverse engineering I think I can see a pattern. Because 231/89 = 2.6 and 89/35 = 2.54, I am thinking of the reverse procedure. That is, somehow 231 (or 35) is chosen and then you move down (or up) dividing (or multiplying) with a factor of 2.6 and round to nearest integer.


Same thing happens below:


  class Bulk-Data
    bandwidth percent 4
    random-detect dscp-based aggregate
   random-detect dscp values 10 minimum-thresh 46 maximum-thresh 71 mark-prob 10
    random-detect dscp values 12 minimum-thresh 18 maximum-thresh 71 mark-prob 10
    random-detect dscp values 14 minimum-thresh 7 maximum-thresh 71 mark-prob 10


That is: 46/2.6 = 18, 18/2.6 = 7


Now, if you take a look at both of these configs you can see that Critical-Data uses percent 20 and Bulk-Data percent 4. That is, Critical-Data is supposed to use 5 times more bandwidth than Bulk-Data. Is it random that both 231/46 and 356/71 are both close to 5? I don't think so.
[Actually the slogan "Is it random? I don't think so!" comes from a very successful telecom catalogue service campaign in Greece with some guy that uses weird calculations to yield the number 11888 and in the end he uses that slogan ]


I am not sure if all this can be "scientifically" explained or there is some rule of thumb or some tuning was done or a combination. Certainly the 5 times more situation makes sense. For the rest I suspect some tuning and some approximation, perhaps with some assistance from cisco (or device itself e.g. some defaults) or manual tuning while traffic was being monitored. The documentation says that when you do not specify min-thresh, max-thresh in "random-detect dscp-based aggregate" command, then those parameters will be set based on interface (VC) bandwidth:
http://www.cisco.com/en/US/docs/interfaces_modules/shared_port_adapters/configuration/7600series/76cfgatm.html#wp1431753
Can you check if that makes sense and what those values are?


To sum up: So far I can't figure how the initial values (e.g. 231 and 356 or 46 and 71) are chosen. The rest can be calculated. Are those rules of thumb "correct"? I guess traffic can respond to this question better than me. And it depends on what you are trying to do. Generally, the configuration doesn't seem "wrong" to me because:
1) You have better treatment for Critical-Data than Bulk-Data (higher maximum-threshold for critical data)
2) Low minimum-thresh means drops start ealier, which means worst treatment for the least important dscp values

3) High maximum-thresh means full drop is delayed for the important data


Another thing I was thinking are some guidelines I had come across in the context of the GSR:
http://www.cisco.com/en/US/docs/ios/11_2/feature/guide/wred_gs.html#wp6484
Calculating B and the min/max thresholds doesn't seem to coincide with anything in your config if I use a value of 9216k for the speed (that is from the name of your policy-map ), but they are not far away either.
B = (9216 * 1000)/8/1500 = 768
min-thresh = 0.03B = 23
max-thresh = 0.1B = 77
The closest match is : random-detect dscp values 12 minimum-thresh 18 maximum-thresh 71 mark-prob 10


Again, those guidelines for the GSR are starting points and estimations. If your traffic behaves ok, then there is no need to change something. Actually, the more I look at this issue the more it seems like some rules of thumb. Configuration would become even more cumbersome (than it already is) if the perfect values was the goal. If you have another policy-map for a VC of different bandwidth we could compare the two to find the relationship between starting values for VC's of different bandwidth (i.e. use one VC bandwidth as a reference point and compute the rest using a factor x).


And of course you could ask cisco, if you haven't already.


Kind Regards,
Maria


Message was edited: missed a B (0.1 -> 0.1B)

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (2 ratings)
Loading.
Calin Chiorean Wed, 09/29/2010 - 05:18
User Badges:
  • Silver, 250 points or more

Hello David!


This is a tricky one

According to Cisco, "these parameters not be changed from their default values unless you have determined that your application will benefit from the changed values",  but there is no reference how to determine the new values.

I've check my books regarding QoS implementation and found nothing about this topic regarding calculation of the min and max threshold in WRED.


I believe you monitor the traffic behavior and application needs for a while and then you adjust the min and max value accordingly. I will try to gather more information from other sources and if I have an update I will write it here.


Calin

Correct Answer
marikakis Wed, 09/29/2010 - 05:38
User Badges:
  • Gold, 750 points or more

Hi Dave,


Are you still stuck with this one?  I know you tend to look for definitive answers and I can't provide you one on this issue. Anyway, I decided to post at least a couple of thoughts. At a first glance the WRED numbers seem completely random, but they are not. Look at this config for example:


class Critical-Data
    bandwidth percent 20
    random-detect dscp-based aggregate
    random-detect dscp values 18 minimum-thresh 231 maximum-thresh 356 mark-prob 10
    random-detect dscp values 20 minimum-thresh 89 maximum-thresh 356 mark-prob 10
    random-detect dscp values 22 minimum-thresh 35 maximum-thresh 356 mark-prob 10


With some reverse engineering I think I can see a pattern. Because 231/89 = 2.6 and 89/35 = 2.54, I am thinking of the reverse procedure. That is, somehow 231 (or 35) is chosen and then you move down (or up) dividing (or multiplying) with a factor of 2.6 and round to nearest integer.


Same thing happens below:


  class Bulk-Data
    bandwidth percent 4
    random-detect dscp-based aggregate
   random-detect dscp values 10 minimum-thresh 46 maximum-thresh 71 mark-prob 10
    random-detect dscp values 12 minimum-thresh 18 maximum-thresh 71 mark-prob 10
    random-detect dscp values 14 minimum-thresh 7 maximum-thresh 71 mark-prob 10


That is: 46/2.6 = 18, 18/2.6 = 7


Now, if you take a look at both of these configs you can see that Critical-Data uses percent 20 and Bulk-Data percent 4. That is, Critical-Data is supposed to use 5 times more bandwidth than Bulk-Data. Is it random that both 231/46 and 356/71 are both close to 5? I don't think so.
[Actually the slogan "Is it random? I don't think so!" comes from a very successful telecom catalogue service campaign in Greece with some guy that uses weird calculations to yield the number 11888 and in the end he uses that slogan ]


I am not sure if all this can be "scientifically" explained or there is some rule of thumb or some tuning was done or a combination. Certainly the 5 times more situation makes sense. For the rest I suspect some tuning and some approximation, perhaps with some assistance from cisco (or device itself e.g. some defaults) or manual tuning while traffic was being monitored. The documentation says that when you do not specify min-thresh, max-thresh in "random-detect dscp-based aggregate" command, then those parameters will be set based on interface (VC) bandwidth:
http://www.cisco.com/en/US/docs/interfaces_modules/shared_port_adapters/configuration/7600series/76cfgatm.html#wp1431753
Can you check if that makes sense and what those values are?


To sum up: So far I can't figure how the initial values (e.g. 231 and 356 or 46 and 71) are chosen. The rest can be calculated. Are those rules of thumb "correct"? I guess traffic can respond to this question better than me. And it depends on what you are trying to do. Generally, the configuration doesn't seem "wrong" to me because:
1) You have better treatment for Critical-Data than Bulk-Data (higher maximum-threshold for critical data)
2) Low minimum-thresh means drops start ealier, which means worst treatment for the least important dscp values

3) High maximum-thresh means full drop is delayed for the important data


Another thing I was thinking are some guidelines I had come across in the context of the GSR:
http://www.cisco.com/en/US/docs/ios/11_2/feature/guide/wred_gs.html#wp6484
Calculating B and the min/max thresholds doesn't seem to coincide with anything in your config if I use a value of 9216k for the speed (that is from the name of your policy-map ), but they are not far away either.
B = (9216 * 1000)/8/1500 = 768
min-thresh = 0.03B = 23
max-thresh = 0.1B = 77
The closest match is : random-detect dscp values 12 minimum-thresh 18 maximum-thresh 71 mark-prob 10


Again, those guidelines for the GSR are starting points and estimations. If your traffic behaves ok, then there is no need to change something. Actually, the more I look at this issue the more it seems like some rules of thumb. Configuration would become even more cumbersome (than it already is) if the perfect values was the goal. If you have another policy-map for a VC of different bandwidth we could compare the two to find the relationship between starting values for VC's of different bandwidth (i.e. use one VC bandwidth as a reference point and compute the rest using a factor x).


And of course you could ask cisco, if you haven't already.


Kind Regards,
Maria


Message was edited: missed a B (0.1 -> 0.1B)

David Williams Sun, 10/10/2010 - 16:29
User Badges:

Thank you everyone for your input on this.  I'm sorry it took so long to get back on this one.  I got pulled into a priority project and this got shelved for a bit.  I'm sure you all have been there.


As always Maria, you have provided excellent detail.  I'm going to review these links and see if I can find a best practice deployment for this.  I figure I could put in a bunch of time trying to figure out what the previous engineers were doing so that I can emulate it or I could develop, test, and deploy new values based on Cisco's recommended approach.


Just for fun though, this does present an interesting puzzle I don't think I will be able to completely drop.  For the sake of time I will deploy new configs, but for the sake of curiosity, I will continue trying to figure out where these values came from.  Your work on this Maria will provide an excellent starting point.


Thank you Thank you! 

marikakis Mon, 10/11/2010 - 09:34
User Badges:
  • Gold, 750 points or more

Hi Dave,


It's good to hear from you. Don't worry about the delay. I did an additional set of weird calculations and figured you were busy (probably with Nexus stuff). Glad to be of some help. Besides, sometimes the questions are more interesting than the answers themselves.


Back to the case. I only had a quick look at the link posted by Calin and it seemed to me that the formula used to calculate B in the GSR link I posted is the same as the formula used there. The links do not provide entirely the same information of course. One thing to note is the value used for the MTU in the formula. In the GSR link the recommendation is to set the MTU to 1500 even if your interfaces/subinterfaces actually use higher MTU's. This is reasonable I think due to techniques that avoid fragmentation end-to-end. In such cases the use of a higher MTU than 1500 might prove too conservative in terms of actual memory used by the transit packets and you might get unnecessary drops. Still, the problem you are facing here is not as simple as using a single formula. You need multiple formulas because you have different WRED treatments. A single formula cannot explain these differences. One approach might be to keep min/max thresholds the same and vary the drop probability (as in the link posted by Calin). Another approach might be to keep drop probability constant and vary the min/max thresholds (as in your case). Varying the drop probability might provide a cleaner configuration though (since you are changing only one number) and it looks easier to tune if needed.


QoS configuration is typically platform/module specific and that should be expected to some extent since QoS deals with partitioning of resources and resources are platform/module specific. For that reason, it would probably be a good move to also ask cisco about this. This way, if anything goes wrong and you need to answer questions, you can always say: "Hey, it's not my fault! Cisco told me to do it!". That's the popular put-the-blame-on-cisco engineering technique! Nevertheless, I don't believe things can go too bad in this case and you can always monitor your NMS to see if any changes are needed before complaints start.


Kind Regards,

Maria

Actions

This Discussion

Related Content