07-07-2009 08:32 AM - edited 03-06-2019 06:37 AM
Hi all,
QOS on the C3750-E is driving me n**ts.
We have a C3750-E switch with servers on it.
Using the default QOS settings (buffers 25 25 25 25, DSCP 0 in Q2, etc..), I see a lot of output drops.
Now the switch is running with the AutoQOS generated queues (DSCP 0 in Q4, etc..), and these settings are better:
Switch# sh mls qos queue-set
Queueset: 1
Queue : 1 2 3 4
----------------------------------------------
buffers : 10 10 26 54
threshold1: 138 138 36 20
threshold2: 138 138 77 50
reserved : 92 92 100 67
maximum : 138 400 318 400
Queueset: 2
Queue : 1 2 3 4
----------------------------------------------
buffers : 16 6 17 61
threshold1: 149 118 41 42
threshold2: 149 118 68 72
reserved : 100 100 100 100
maximum : 149 235 272 242
However, some ports are still experiencing drops (for example one that is running constantly at 30 Mbps/4000pps outbound, max) but it is very strange: 1 server goes up to 50 Mbps (50%) load outbound without drops, while others already start dropping from 3-5Mbps outbound. This started me thinking it might be related to pps instead of bytes per second.
I have done a test in the lab, and -with the same QOS settings- i can get 50Mbps/52000 pkts/sec out easily without any drop on a 100Mbps interface.
Therefore the only thing left for me to think about is the shared buffer structure of the C3750E. There might be an overload on the ASIC buffers, so that on one time t a server on port 1/2 is taking up all buffers so that a server on port 1/3 already needs to start dropping at very low rate.
Is there any way i can see the length of the queue and if the switch is tail-dropping ? Can i see the length of the queue globally (on asic level ?) and see if their are drops here (show buffers command maybe ?)
PS. the switch is running 12.2(50)SE2 and only doing L2 switching.
Ports are all configured as:
srr-queue bandwidth share 10 10 60 20
srr-queue bandwidth shape 10 0 0 0
queue-set 1 (default)
* Also, do the inbound and outbound direction share buffer space ? I have the impression that ports with 50 Mbps inbound load start to drop much faster outbound (already from 5-10 Mbps)
07-07-2009 09:33 AM
I have seen similar issue with 3560. When qos was enabled (mls qos), I started dropping at 4Mb load on a 1GE port. This affected COS4/DSCP40 & 46 packets.
It seems that that was the default behavior in order to police priority traffic.
I got rid of the drops, by removing "mls qos" since my design did not require any additional QOS on that specific box.
An other alternative would have been to start playing with buffers and thresholds...I gave it a miss.
HTH
Sam
07-07-2009 12:09 PM
Although not specific for 3750-E, consider how similar it's to 3560/3750 switches, from release notes, this is interesting:
"CSCeg29704 (Catalyst 3750 and 3560 switches)
When QoS is enabled, bursty and TCP-based applications might have significant performance degradation due to unexpected packet drops on some of the egress queues.
The workaround is to tune the egress queue buffer allocation and bandwidth scheduling parameters to allocate more bandwidth and buffer space to the affected queues. "
Premature drops, from insufficiently sized buffers, would certainly impact TCP performance.
Also interesting, is how different default settings are when QoS enabled, when auto-QoS activated, and what's recommended. For the latter, see:
http://www.cisco.com/en/US/docs/solutions/Enterprise/WAN_and_MAN/QoS_SRND/QoSDesign.html#wp999803
and
"* Also, do the inbound and outbound direction share buffer space ? I have the impression that ports with 50 Mbps inbound load start to drop much faster outbound (already from 5-10 Mbps)"
From documentation, my impression it's different buffer space, but not 100% certain. Believe I've read Cisco usually doesn't see the inbound queue as an issue. (Which makes sense since egress queuing is generally congestion point because egress port doesn't have sufficient bandwidth to support ingress port or ports.)
Examination of MLS QoS port stats, and where the drops are happening, would be good place to start if you're going to attempt to tune egress queues.
What's not clear from Cisco documentation, and provided stats, might be how best to allocate buffers to be reserved or common. Further, there might not be stats to indicate whether drops are from hitting WTD, or lack of buffer space.
07-07-2009 12:23 PM
Regarding the bug, this is fixed in my version:
"Specifically, egress queue 2 thresholds need to have the following settings:
Thresholds1 = 200
Thresholds2 = 200
Reserved = 50
Maximum = 400"
-> These are indead the defaults. But they are still bad. AutoQOS generated templates are better.
07-07-2009 02:19 PM
Joseph,
The second link provides some interesting insight. AutoQOS maps DSCP 0 to Q4T3.
According to this document, T3 threshold is always 100% (the tail of the queue). Since i am still having drops it means by buffer is too small. The buffers for queue-set 1 are
Queueset: 1
Queue : 1 2 3 4
----------------------------------------------
buffers : 10 10 26 54
threshold1: 138 138 36 20
threshold2: 138 138 77 50
reserved : 92 92 100 67
maximum : 138 400 318 400
Or 54% of the buffer memory is for Q4.
Queue-set 2 has more reserved for Q4:
Queueset: 2
Queue : 1 2 3 4
----------------------------------------------
buffers : 16 6 17 61
threshold1: 149 118 41 42
threshold2: 149 118 68 72
reserved : 100 100 100 100
maximum : 149 235 272 242
Also the "reserved" part of the queue length is 100% in QueueSet2 (not in QueueSet1).
Therefore i am going to change the port to QueueSet2 and see if this helps (or improves) on a server with regular drops.
07-27-2009 12:20 AM
I found this a useful link also:
http://www.cisco.com/en/US/products/hw/switches/ps5023/products_tech_note09186a0080883f9e.shtml#qds
show platform pm if-numbers
show platform port-asic stats drop port xx
It showed me the commands to check if the queues were getting drops on them as the show mls qos interface statistics command wasn't showing drop stats on my version.
I have exactly the same issue on a 3750, well behaved TCP streams (test stream generated via iperf.exe) can get 80-90Mbps, an HTTP download maybe only 5-8Mbps.
It's frustrating as you follow the SRND which generally advices you not to tune buffers and you find the SRND values don't work!
Jed
07-27-2009 03:52 AM
Interestingly enough the buffer settings created by Auto-QoS are different if you use different IOS versions. On 3560/3750 Cisco has changed the values for the buffers somewhere between 12.2(40)SE and 12.2(50)SE - as far as I remember.
So even though it is stated that you *should* not tune buffers, Cisco seems to have determined it to be needed. Perhaps someone inside Cisco can / will comment on this?
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide