QoS - Voice Traffic Queueing

Unanswered Question
Aug 18th, 2009

Hi

I've an issue thats been bugging me for some time, I have a hierarchical policy map applied to an outbound interface, this Policy is shaping traffic to the Maximum circuit rate, within this is a policy map to detail how traffic matching Voice, Video and AF traffic is to be treated

Now my understanding was, that even when the shaper was active traffic matching EF or CS5 would by-pass the shaper and schedualer go directly to the the LLQ (hardware Queue)

However when reviewing the polic-map I am seeing packets that are queueing within the voice queue

Serial0: DLCI 100 -

Service-policy output: Test1-Parent-CE

Class-map: class-default (match-any)

233085 packets, 235568795 bytes

30 second offered rate 17000 bps, drop rate 0 bps

Match: any

Traffic Shaping

Target/Average Byte Sustain Excess Interval Increment

Rate Limit bits/int bits/int (ms) (bytes)

2048000/2048000 4096 16384 16384 8 2048

Adapt Queue Packets Bytes Packets Bytes Shaping

Active Depth Delayed Delayed Active

- 0 233124 235617887 170210 215529114 no

Service-policy : Test1-Child-CE

Class-map: RealTime1MQC (match-any)

25713 packets, 1473074 bytes

30 second offered rate 13000 bps, drop rate 0 bps

Match: dscp ef

25713 packets, 1473074 bytes

30 second rate 13000 bps

Queueing

Strict Priority

Output Queue: Conversation 136

Bandwidth 200 (kbps) Burst 5000 (Bytes)

(pkts matched/bytes matched) 2479/137055 <----- this

(total drops/bytes drops) 0/0

Cisco themselves say the following:

pkts matched/bytes matched

Number of packets (also shown in bytes) matching this class that were placed in the queue. This number reflects the total number of matching packets queued at any time. Packets matching this class are queued only when congestion exists. If packets match the class but are never queued because the network was not congested, those packets are not included in this total. However, if process switching is in use, the number of packets is always incremented even if the network is not congested.

So my Question is, why am I queueing EF traffic?

thanks in advance

James

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (1 ratings)
Loading.
Peter Paluch Tue, 08/18/2009 - 04:21

Hello,

The LLQ is not a hardware queue - on the contrary, it is a software queue. The entire software queueing defined by your policy-map kicks in only when the hardware queue of your interface starts to fill. If there still is a space in your hardware queue, the packets are stored there, bypassing all your class-based QoS configuration.

The LLQ is also called PQ/CBWFQ (Priority Queueing/Class Based Weighted Fair Queueing). The idea of LLQ is to have one software priority queue also called a LLQ queue, and a number of software CBWFQ queues. The priority queue is served until it is empty. If and only if it is empty, the remaining queues are served in the CBWFQ fashion. So, if the hardware queue starts to fill, next packets will be stored in software queues according to their classification. The voice packets are stored into the priority queue and they will be dequeued as soon as possible, before any other software queues - but only if there is space for at least one packet in the hardware queue. If there is none, the packets will stay queued in software queues.

So it is normal if you see even your voice packets queued. You should be concerned only if the perceived voice quality starts to decline.

Best regards,

Peter

Joseph W. Doherty Tue, 08/18/2009 - 04:48

"The entire software queueing defined by your policy-map kicks in only when the hardware queue of your interface starts to fill. If there still is a space in your hardware queue, the packets are stored there, bypassing all your class-based QoS configuration. "

Two points of clarification.

CBWFQ policies using shapers or policers don't depend on hardware queue for activation. I.e. shaper described in OP would not depend on hardware queue.

It's not when hardware queue "starts to fill" but when hardware queue overflows; i.e. it's filled. Or, when not "If there still is a space in your hardware queue".

PS:

BTW, on many interfaces hardware queue capacity can be adjusted, and for real-time traffic, you often need to adjust it smaller to insure real-time traffic doesn't queue behind (too much) non real-time traffic.

Peter Paluch Tue, 08/18/2009 - 04:53

Joseph,

Thank you very much for clarifying my vague description. I appreciate it!

Best regards,

Peter

Joseph W. Doherty Tue, 08/18/2009 - 05:05

Peter,

I didn't think it was vague; thought it quite good. I appreciate you appreciate the clarifications. Thanks for letting me know.

Joseph W. Doherty Tue, 08/18/2009 - 04:26

"Now my understanding was, that even when the shaper was active traffic matching EF or CS5 would by-pass the shaper and schedualer go directly to the the LLQ (hardware Queue) "

What lead you to believe EF/CS5 traffic would bypass an active shaper?

(BTW, LLQ isn't a hardware queue, it's CBWFQ's priority (software) queue.)

If your policies are like these:

(NB: syntax may be incorrect.)

policy-map Test1-Child-CE

class RealTime1MQC

priority 200

policy-map Test1-Parent-CE

class class-default

shape average 2000

service-policy Test1-Child-CE

The parent's shaper, when active, pushes traffic to the child policy, including your RealTime1MQC matches.

However, if your policies were:

policy-map Test1-Child-CE

policy-map Test1-Parent-CE

class RealTime1MQC

priority 200

police 200

class class-default

shape average 1800

service-policy Test1-Child-CE

Then your RealTime1MQC class traffic would bypass the shaper. (Normally, wouldn't recommend the 2nd set of policies since unused RealTime1MQC isn't available to other traffic.)

analysts Tue, 08/18/2009 - 05:38

Ok let me rephrase then, when I put bypass an active shaper/scheduler and straight to the LLQ (Hardware Queue) I mean't the packet would pass into the LLQ (software) and straight through to the Hardware Queue, this is a Serial circuit so the TX ring has a value of 16

Why did I believe it bypassed a shaper? simply because voice can't tolerate Jitter and a shaper by its very nature adds variable delay and jitter when active, surely this defeats the very idea of using a shaper with voice?

with regard to Joseph

class RealTime1MQC

priority 200

police 200 <-- this line isnt needed as priority 200 indicates its to be policed at 200, this is a maximum value, not a minimum value that is specified when using the Syntax "bandwidth 200" within a CBWFQ class

So are we saying that if I use the following form of nested policy

policy-map test1-Child-CE

class System MQC

bandwidth 24

random-detect dscp-based

class RealTime1MQC

priority 200

class RealTime2MQC

priority 512

class Application4MQC

bandwidth 512

random-detect dscp-based

policy-map test-Parent-CE

class class-default

shape average 2048000 16384

service-policy test1-Child-CE

Then Voice traffic will be queued when the shaper is active, basically in effect making the LLQ null and void?

The policy to ensure that the LLQ is used correctly and not queued would need to look something like this?

policy-map test1-Child-CE

class System MQC

bandwidth 24

random-detect dscp-based

class RealTime2MQC

priority 512

class Application4MQC

bandwidth 512

random-detect dscp-based

policy-map test-Parent-CE

class RealTime1MQC

priority 200

class class-default

shape average 1848000

service-policy test1-Child-CE

Joseph W. Doherty Tue, 08/18/2009 - 07:36

"Why did I believe it bypassed a shaper? simply because voice can't tolerate Jitter and a shaper by its very nature adds variable delay and jitter when active, surely this defeats the very idea of using a shaper with voice?"

Yes a shaper does distort the traffic flow timing, but VoIP can tolerate some jitter. (Remember the size of the hardware tx queue can also have similar impact.) When working with a shaper, you can minimize such impact by reducing the Tc. (I've found 10 ms seems to work well.)

The reason for using the shaper is to match some futher downstream bottleneck that we can't directly configure QoS for. With a nested policy, the child's LLQ insures traffic in that queue is dequeued first.

"police 200 <-- this line isnt needed as priority 200 indicates its to be policed at 200, this is a maximum value, not a minimum value that is specified when using the Syntax "bandwidth 200" within a CBWFQ "

Actually, the LLQ's implicit policer often only engages when there's congestion, so it's possible to transmit more LLQ traffic then is intended. Normally, this isn't an issue, but if we use the parent policy with both LLQ and a shaper, we should insure class allocations are as we expect. So, in this situation, having it is good idea (for another reason, it may detect bursts that you wouldn't otherwise know were happening).

"Then Voice traffic will be queued when the shaper is active, basically in effect making the LLQ null and void?

The policy to ensure that the LLQ is used correctly and not queued would need to look something like this? "

Again, the fact that VoIP traffic is shaped, and queued, doesn't imply a child's LLQ is null and void or make for a VoIP quality issue. As Peter noted, the issue is whether there's an actual VoIP quality issue. If there is, I would insure the shaper's Tc is reduced and that the hardware tx queue is reduced (and shape for actual available bandwidth - see postscript info). If these don't resolve the quality issue, then you can try the second policy approach to insure the VoIP traffic isn't subject to the shaper. (With the latter, you still want to control tx queue allocation.)

PS:

BTW, one issue with shapers, I suspect they manage (on some platform IOS combinations) their own queue, or queues, and only when these overflow are the child policy's queues used. If this is true, this could be another reason why you might need to bypass the shaper using the second policy approach.

Also BTW, in your policy postings I see you have two LLQ classes, RealTime1MQC and RealTime2MQC. Within a single policy there's only one actual LLQ. The different LLQ classes only provide different implicit policers. If you move RealTime1MQC to the parent policy, RealTime2MQC would only dequeue if there's no RealTime1MQC traffic. If this is fine, then you can use it as you've defined it, otherwise you'll need to move this class to the parent policy too and adjust the parent's class-default shaper allocation again.

Also BTW, shapers, I believe, don't account for L2 overhead, so to really insure correct shaping you need to reduce the rate to allow for it. (NB: L2 overhead % varies per packet size, which mean you may need to allow for worst case to fully guarantee VoIP performance. However, again given typical VoIP tolerance, allowing for your average L2 overhead usually seems to work okay. I've found 10% seems to serve well.)

analysts Wed, 08/19/2009 - 00:51

Yes a shaper does distort the traffic flow timing, but VoIP can tolerate some jitter. (Remember the size of the hardware tx queue can also have similar impact.) When working with a shaper, you can minimize such impact by reducing the Tc. (I've found 10 ms seems to work well.)

Maybe my maths here is wrong, I thought that Tc was calculated by Bc/CIR so in this case it would be 16384/2048000 = 0.008 which is 8ms, however Cisco's IOS was locked down to 10ms as a minimum as documented here - hxxp://www.cisco.com/warp/public/788/voip/fr_traffic.html

I could be wrong as I've never had anyone explain it but that was my understanding from reading the literature, have I got the wrong end of the stick?

The reason for using the shaper is to match some futher downstream bottleneck that we can't directly configure QoS for

We use Shapers as we often provide sub-rate speeds on access circuits, so we use a shaper to reduce the impact of TCP Slow/Start, rather than a policer

With a nested policy, the child's LLQ insures traffic in that queue is dequeued first. Actually, the LLQ's implicit policer often only engages when there's congestion, so it's possible to transmit more LLQ traffic then is intended. Normally, this isn't an issue, but if we use the parent policy with both LLQ and a shaper, we should insure class allocations are as we expect. So, in this situation, having it is good idea (for another reason, it may detect bursts that you wouldn't otherwise know were happening).

I wasn't aware the policer only activated during periods of congestion, although I've yet to see an LLQ exhaust all other queues, but that's more likely down to end users not having enough Voice traffic at any given time to max the circuit out, thanks for the heads up on that one 

Again, the fact that VoIP traffic is shaped, and queued, doesn't imply a child's LLQ is null and void or make for a VoIP quality issue. As Peter noted, the issue is whether there's an actual VoIP quality issue. If there is, I would insure the shaper's Tc is reduced and that the hardware tx queue is reduced (and shape for actual available bandwidth - see postscript info). If these don't resolve the quality issue, then you can try the second policy approach to insure the VoIP traffic isn't subject to the shaper. (With the latter, you still want to control tx queue allocation.)

I'm sure the Tc value is ok (depending on maths of course) so I can adjust the TX_limited, but even so its currently at Ciscos suggested Default of 16, my concern is if I adjust this too low I'm going to negatively impact other traffic types as they will sit for longer in the software queues, As to Quality of calls, usually its fine, only under periods of Congestion do they report any kind of quality issue

Also BTW, in your policy postings I see you have two LLQ classes, RealTime1MQC and RealTime2MQC. Within a single policy there's only one actual LLQ. The different LLQ classes only provide different implicit policers. If you move RealTime1MQC to the parent policy, RealTime2MQC would only dequeue if there's no RealTime1MQC traffic. If this is fine, then you can use it as you've defined it, otherwise you'll need to move this class to the parent policy too and adjust the parent's class-default shaper allocation again.

Yeah I'm aware of that, I only moved the single queue as an example to see if that's the kind of config you where referring to, I should of done it properly my bad 

Thanks for the heads up though, its certainly helped clear some confusion in my head, obviously my concern is Cisco's literature states the policy should be under the shaper, I would of thought that they would of mentioned a caveat that in this type of build, Voice traffic will be queued in the event the shaper becomes active

Joseph W. Doherty Wed, 08/19/2009 - 02:51

Yes you're correct on how Tc is calculated. (I didn't actually do the math on your values.) BTW, I believe some of the later 12.4T versions, Tc can go even smaller (4 ms?) and it might default to the value other than a default(?) of 25 ms.

"We use Shapers as we often provide sub-rate speeds on access circuits, so we use a shaper to reduce the impact of TCP Slow/Start, rather than a policer"

The defaults for policers vs. shapers tend to have the former emulate an interface with a very shallow FIFO queue and the latter an interface with WFQ (except for 12.4.20T, and later, shapers). Both can be adjusted (somewhat) on how they impact traffic flows.

When supporting VoIP and using shapers, I believe you also need to account for the L2 overhead, otherwise a shaper can pass data faster then really intended. For example, if you had an E3 on the transmitting side and an E1 on the receiving side, setting a shape rate to 2 Mbps might allow FIFO congestion on the E1 link.

"I wasn't aware the policer only activated during periods of congestion, although I've yet to see an LLQ exhaust all other queues, but that's more likely down to end users not having enough Voice traffic at any given time to max the circuit out, thanks for the heads up on that one"

The simple rule seems to be, the LLQ implicit policer only activates when there's traffic in the LLQ, not just hitting the class.

"I'm sure the Tc value is ok (depending on maths of course) so I can adjust the TX_limited, but even so its currently at Ciscos suggested Default of 16, my concern is if I adjust this too low I'm going to negatively impact other traffic types as they will sit for longer in the software queues, As to Quality of calls, usually its fine, only under periods of Congestion do they report any kind of quality issue"

The problem isn't sitting in the software queues, the issue is whether an inteface can be kept 100% busy w/o sufficient packets in the hardware queue. Given a choice between insuring something like VoIP works correctly, vs. driving an interface at maximum possible efficienty, I chose the former. I.e. I size the hardware queue to minimize the impact of its FIFO queue. Consider if your hardware queue allows for 16 packets, and those 16 packets could be maximum size FTP packets, all might need to be transmitted before your VoIP packet. The impact of this depends on interface bandwidth, but for 2 Mbps (my math?) indicates a delay of 96 ms.

[edit]

If your encountering VoIP quality issues when there's periods of congestion, then there's likely other packets getting ahead of the VoIP packets when they shouldn't. I've found the hardware TX queue can be an issue and/or, for cloud links, pushing more then the other side can actually accept. (Also for cloud links, you got to watch if more than one site can send to the receiver.) Both tend to FIFO queue packets where VoIP packets get delayed.

analysts Wed, 08/19/2009 - 03:02

Thanks Joseph most of that makes sense, one thing I have noticed however is when I labbed this up this morning, when I run the standard config

policy-map test-child-2Mb-CE

class Voice

priority percent 30

class System

bandwidth percent 1

random-detect dscp-based

class Standard

bandwidth percent 69

random-detect dscp-based

policy-map test-shape

class class-default

shape average 2048000

service-policy test-child-2Mb-CE

map-class frame-relay MapClass_0

service-policy output test-shape

interface Serial0/0/0

description 2Mb

bandwidth 2048

ip address xxx.xxx.xxx.xxx 255.255.255.252

encapsulation frame-relay

load-interval 30

frame-relay class MapClass_0

frame-relay interface-dlci 200

max-reserved-bandwidth 100

I get the following results

Serial0/0/0: DLCI 200 -

Service-policy output: test-shape

Class-map: class-default (match-any)

92895 packets, 20616429 bytes

30 second offered rate 201000 bps, drop rate 0 bps

Match: any

Traffic Shaping

Target/Average Byte Sustain Excess Interval Increment

Rate Limit bits/int bits/int (ms) (bytes)

2048000/2048000 12800 51200 51200 25 6400

Adapt Queue Packets Bytes Packets Bytes Shaping

Active Depth Delayed Delayed Active

- 0 92888 20611910 4343 2754011 no

Service-policy : test-child-2Mb-CE

Class-map: Voice (match-all)

38183 packets, 2491037 bytes

30 second offered rate 20000 bps, drop rate 0 bps

Match: dscp cs5 (40)

Queueing

Strict Priority

Output Queue: Conversation 136

Bandwidth 30 (%)

Bandwidth 614 (kbps) Burst 15350 (Bytes)

(pkts matched/bytes matched) 685/44666

(total drops/bytes drops) 0/0

Class-map: System (match-all)

169 packets, 25978 bytes

30 second offered rate 1000 bps, drop rate 0 bps

Match: dscp cs6 (48)

Queueing

Output Queue: Conversation 137

Bandwidth 1 (%)

Bandwidth 20 (kbps)

(pkts matched/bytes matched) 3/151

(depth/total drops/no-buffer drops) 0/0/0

exponential weight: 9

mean queue depth: 0

dscp Transmitted Random drop Tail drop Minimum Maximum Mark

pkts/bytes pkts/bytes pkts/bytes thresh thresh prob

cs6 179/29690 0/0 0/0 32 40 1/10

however when I then remove this and try running the policy we mentioned yesterday

policy-map test-child-2Mb-CE

class System

bandwidth percent 1

random-detect dscp-based

class Standard

bandwidth percent 98

random-detect dscp-based

policy-map test-shape

class Voice

priority 614

police 614000

Class class-default

shape average 1434000 1000

service-policy test-child-2Mb-CE

map-class frame-relay MapClass_0

service-policy output test-shape

interface Serial0/0/0

description 2Mb

bandwidth 2048

ip address xxx.xxx.xxx.xxx 255.255.255.252

encapsulation frame-relay

load-interval 30

frame-relay class MapClass_0

frame-relay interface-dlci 200

max-reserved-bandwidth 100

I get no response back from the show policy-map command, the same as if no policy is applied to the interface

What am I missing?

Joseph W. Doherty Wed, 08/19/2009 - 03:44

Can't easily say, especially my experience with QoS and frame-relay has used subinterface per-VC vs. frame-relay maps.

However, what it might be, there are some limitations on using CBWFQ where the policy doesn't "know" how much bandwidth it has to manage (before 12.4.20T?). For test purposes, try the test policy under the serial0/0/0 interface. (BTW, also see if any error messages, after applying the policy, are in syslog.)

analysts Wed, 08/19/2009 - 04:24

Great call Joseph, I'm used to working on Ethernet Circuits so the policy-map goes onto the interface directly, however as this is a frame-relay circuit I kinda got tunnel vision with regards to applying the policy under the map-class

I've managed *I think* to get it sorted

The config currently looks like this

policy-map Test-child-2Mb-CE

class System

bandwidth percent 1

random-detect dscp-based

class Standard

bandwidth percent 90

random-detect dscp-based

policy-map test-shape

class Voice

priority 700

police cir 700000 bc 5000

class class-default

shape average 1348000 10000

service-policy Test-child-2Mb-CE

I've applied the policy map directly to the serial interface and its now being used, however I was still getting queued voice packets, I've adjusted the TX_limit value from 16 to 10, the frequency of queued packets decreased but they where still evident. I then added the Bc Value to the police statement under the Voice class to bring the Tc down to just over 7ms

the results I'm now seeing are

Serial0/0/0

Service-policy output: test-shape

Class-map: Voice (match-all)

7034 packets, 506012 bytes

30 second offered rate 0 bps, drop rate 0 bps

Match: dscp cs5 (40)

Queueing

Strict Priority

Output Queue: Conversation 264

Bandwidth 700 (kbps) Burst 17500 (Bytes)

(pkts matched/bytes matched) 1/64

(total drops/bytes drops) 0/0

police:

cir 700000 bps, bc 5000 bytes

conformed 7034 packets, 506012 bytes; actions:

transmit

exceeded 0 packets, 0 bytes; actions:

drop

conformed 0 bps, exceed 0 bps

1 packet queued out of 7k packets transmitted, Oh and I think you where right about being able to drop the Tc value to 4ms, I initial set the value to the lowest (1000) which was accepted by the router fine, however, I was getting an error when I tried to apply the policy directly to the interface

Lab1(config-if)#service-policy output test-shape

shaping interval is 0 milliseconds. intervals below 4 milliseconds rejected

So 4ms does look like the lowest Cisco will accept

Will continue to test this and see what I come back with, thanks for your help and guidance

Joseph W. Doherty Wed, 08/19/2009 - 06:29

Again, don't become overly concerned if you see packet counting in the LLQ. Any software queueing just means the hardware queued overflowed (which is what we want since the hardware queue is FIFO). However, you shouldn't see many packets queued (depth, not overall queued percentage) in the LLQ.

If you don't place your LLQ within the child policy, the Tc doesn't need to be as small.

For VoIP and LLQ, the idea is not allow other packets to get in front of them. This is why we want a minimum hardware queue and, if used, minimum shaper bursting, so that packets are placed into the software queues where LLQ will be dequeued first.

Actions

This Discussion