cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
993
Views
0
Helpful
6
Replies

C3750-E and QOS queueing/drops

gnijs
Level 4
Level 4

Hi all,

QOS on the C3750-E is driving me n**ts.

We have a C3750-E switch with servers on it.

Using the default QOS settings (buffers 25 25 25 25, DSCP 0 in Q2, etc..), I see a lot of output drops.

Now the switch is running with the AutoQOS generated queues (DSCP 0 in Q4, etc..), and these settings are better:

Switch# sh mls qos queue-set

Queueset: 1

Queue : 1 2 3 4

----------------------------------------------

buffers : 10 10 26 54

threshold1: 138 138 36 20

threshold2: 138 138 77 50

reserved : 92 92 100 67

maximum : 138 400 318 400

Queueset: 2

Queue : 1 2 3 4

----------------------------------------------

buffers : 16 6 17 61

threshold1: 149 118 41 42

threshold2: 149 118 68 72

reserved : 100 100 100 100

maximum : 149 235 272 242

However, some ports are still experiencing drops (for example one that is running constantly at 30 Mbps/4000pps outbound, max) but it is very strange: 1 server goes up to 50 Mbps (50%) load outbound without drops, while others already start dropping from 3-5Mbps outbound. This started me thinking it might be related to pps instead of bytes per second.

I have done a test in the lab, and -with the same QOS settings- i can get 50Mbps/52000 pkts/sec out easily without any drop on a 100Mbps interface.

Therefore the only thing left for me to think about is the shared buffer structure of the C3750E. There might be an overload on the ASIC buffers, so that on one time t a server on port 1/2 is taking up all buffers so that a server on port 1/3 already needs to start dropping at very low rate.

Is there any way i can see the length of the queue and if the switch is tail-dropping ? Can i see the length of the queue globally (on asic level ?) and see if their are drops here (show buffers command maybe ?)

PS. the switch is running 12.2(50)SE2 and only doing L2 switching.

Ports are all configured as:

srr-queue bandwidth share 10 10 60 20

srr-queue bandwidth shape 10 0 0 0

queue-set 1 (default)

* Also, do the inbound and outbound direction share buffer space ? I have the impression that ports with 50 Mbps inbound load start to drop much faster outbound (already from 5-10 Mbps)

6 Replies 6

cisco_lad2004
Level 5
Level 5

I have seen similar issue with 3560. When qos was enabled (mls qos), I started dropping at 4Mb load on a 1GE port. This affected COS4/DSCP40 & 46 packets.

It seems that that was the default behavior in order to police priority traffic.

I got rid of the drops, by removing "mls qos" since my design did not require any additional QOS on that specific box.

An other alternative would have been to start playing with buffers and thresholds...I gave it a miss.

HTH

Sam

Joseph W. Doherty
Hall of Fame
Hall of Fame

Although not specific for 3750-E, consider how similar it's to 3560/3750 switches, from release notes, this is interesting:

"CSCeg29704 (Catalyst 3750 and 3560 switches)

When QoS is enabled, bursty and TCP-based applications might have significant performance degradation due to unexpected packet drops on some of the egress queues.

The workaround is to tune the egress queue buffer allocation and bandwidth scheduling parameters to allocate more bandwidth and buffer space to the affected queues. "

Premature drops, from insufficiently sized buffers, would certainly impact TCP performance.

Also interesting, is how different default settings are when QoS enabled, when auto-QoS activated, and what's recommended. For the latter, see:

http://www.cisco.com/en/US/docs/solutions/Enterprise/WAN_and_MAN/QoS_SRND/QoSDesign.html#wp999803

and

http://www.cisco.com/en/US/docs/solutions/Enterprise/WAN_and_MAN/QoS_SRND_40/QoSCampus_40.html#wp1099462

"* Also, do the inbound and outbound direction share buffer space ? I have the impression that ports with 50 Mbps inbound load start to drop much faster outbound (already from 5-10 Mbps)"

From documentation, my impression it's different buffer space, but not 100% certain. Believe I've read Cisco usually doesn't see the inbound queue as an issue. (Which makes sense since egress queuing is generally congestion point because egress port doesn't have sufficient bandwidth to support ingress port or ports.)

Examination of MLS QoS port stats, and where the drops are happening, would be good place to start if you're going to attempt to tune egress queues.

What's not clear from Cisco documentation, and provided stats, might be how best to allocate buffers to be reserved or common. Further, there might not be stats to indicate whether drops are from hitting WTD, or lack of buffer space.

Regarding the bug, this is fixed in my version:

"Specifically, egress queue 2 thresholds need to have the following settings:

Thresholds1 = 200

Thresholds2 = 200

Reserved = 50

Maximum = 400"

-> These are indead the defaults. But they are still bad. AutoQOS generated templates are better.

Joseph,

The second link provides some interesting insight. AutoQOS maps DSCP 0 to Q4T3.

According to this document, T3 threshold is always 100% (the tail of the queue). Since i am still having drops it means by buffer is too small. The buffers for queue-set 1 are

Queueset: 1

Queue : 1 2 3 4

----------------------------------------------

buffers : 10 10 26 54

threshold1: 138 138 36 20

threshold2: 138 138 77 50

reserved : 92 92 100 67

maximum : 138 400 318 400

Or 54% of the buffer memory is for Q4.

Queue-set 2 has more reserved for Q4:

Queueset: 2

Queue : 1 2 3 4

----------------------------------------------

buffers : 16 6 17 61

threshold1: 149 118 41 42

threshold2: 149 118 68 72

reserved : 100 100 100 100

maximum : 149 235 272 242

Also the "reserved" part of the queue length is 100% in QueueSet2 (not in QueueSet1).

Therefore i am going to change the port to QueueSet2 and see if this helps (or improves) on a server with regular drops.

I found this a useful link also:

http://www.cisco.com/en/US/products/hw/switches/ps5023/products_tech_note09186a0080883f9e.shtml#qds

show platform pm if-numbers

show platform port-asic stats drop port xx

It showed me the commands to check if the queues were getting drops on them as the show mls qos interface statistics command wasn't showing drop stats on my version.

I have exactly the same issue on a 3750, well behaved TCP streams (test stream generated via iperf.exe) can get 80-90Mbps, an HTTP download maybe only 5-8Mbps.

It's frustrating as you follow the SRND which generally advices you not to tune buffers and you find the SRND values don't work!

Jed

Interestingly enough the buffer settings created by Auto-QoS are different if you use different IOS versions. On 3560/3750 Cisco has changed the values for the buffers somewhere between 12.2(40)SE and 12.2(50)SE - as far as I remember.

So even though it is stated that you *should* not tune buffers, Cisco seems to have determined it to be needed. Perhaps someone inside Cisco can / will comment on this?

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card