Can't figure out output discards

paul amaral · ‎04-22-2016

Hi all, I have an interface that constantly has output drops, looking at the drops I can confirm they are all output discard. The problem im having is I can’t figure out what is causing the output packet drops.

The interface is part of a routed vlan and it set for 100Mb full duplex connected to an L2 switch set at 100/full as well. I already replaced the L2 switch and that didn’t make a difference. Also looking at the L2 switches’ interface there are not input drops or errors at all.

I have read a lot of documents on discards and have done as much troubleshooting as I can and have not been able to stop this from happening or even determine what packets are being dropped.

I have looked for microbursts using wireshark and there are none even at 2Mb you will see discard sometimes. I have increase the output queue to match the input queue and that didn’t help. The interface looks clean with no CRC/Runts etc and I’m barely touching the 100Mb throughput. Can someone recommend what else I can do and look at to determined what the issue might be.

Mod Ports Card Type                              Model              Serial No.
--- ----- -------------------------------------- ------------------ -----------
1   48 SFM-capable 48-port 10/100 Mbps RJ45   WS-X6548-RJ-45     SAL0710A54G
2   48 SFM-capable 48-port 10/100 Mbps RJ45   WS-X6548-RJ-45     SAL09444KVM

7    2 Supervisor Engine 720 (Active)         WS-SUP720-3BXL     SAD084202LK
8    2 Supervisor Engine 720 (Hot)            WS-SUP720-3BXL     SAL1015JPRZ

TIA, Paul

Vlan615 is up, line protocol is up

Hardware is EtherSVI, address is 0015.c7c7.0880 (bia 0015.c7c7.0880)

Internet address is xxxx

MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,

reliability 255/255, txload 1/255, rxload 1/255

Encapsulation ARPA, loopback not set

Keepalive not supported

ARP type: ARPA, ARP Timeout 04:00:00

Last input 00:00:22, output 00:00:22, output hang never

Last clearing of "show interface" counters 01:49:53

Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0

Queueing strategy: fifo

Output queue: 0/40 (size/max)

5 minute input rate 1337000 bits/sec, 349 packets/sec

5 minute output rate 1938000 bits/sec, 344 packets/sec

L2 Switched: ucast: 214 pkt, 14696 bytes - mcast: 12 pkt, 768 bytes

L3 in Switched: ucast: 1582724 pkt, 567827172 bytes - mcast: 0 pkt, 0 bytes mcast

L3 out Switched: ucast: 1625654 pkt, 1115273430 bytes mcast: 0 pkt, 0 bytes

1584871 packets input, 568106584 bytes, 0 no buffer

Received 12 broadcasts (0 IP multicasts)

0 runts, 0 giants, 0 throttles

0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

1623842 packets output, 1114306193 bytes, 0 underruns

0 output errors, 0 interface resets

0 output buffer failures, 0 output buffers swapped out

FastEthernet1/36 is up, line protocol is up (connected)

Hardware is C6k 100Mb 802.3, address is 0009.11f6.35b3 (bia 0009.11f6.35b3)

Description: Spamcan new port - pa testing

MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec,

reliability 255/255, txload 2/255, rxload 2/255

Encapsulation ARPA, loopback not set

Keepalive set (10 sec)

Full-duplex, 100Mb/s, media type is 10/100BaseTX

input flow-control is off, output flow-control is unsupported

ARP type: ARPA, ARP Timeout 04:00:00

Last input 00:00:25, output never, output hang never

Last clearing of "show interface" counters 01:50:32

Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 2468

Queueing strategy: fifo

Output queue: 0/40 (size/max)

30 second input rate 942000 bits/sec, 289 packets/sec

30 second output rate 1097000 bits/sec, 271 packets/sec

1592429 packets input, 569790617 bytes, 0 no buffer

Received 3434 broadcasts (3422 multicasts)

0 runts, 0 giants, 0 throttles

0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

0 watchdog, 0 multicast, 0 pause input

0 input packets with dribble condition detected

1626638 packets output, 1105460469 bytes, 0 underruns

0 output errors, 0 collisions, 0 interface resets

0 babbles, 0 late collision, 0 deferred

0 lost carrier, 0 no carrier, 0 PAUSE output

0 output buffer failures, 0 output buffers swapped out

sh int fast1/36 counters error

Port Align-Err FCS-Err Xmit-Err Rcv-Err

UnderSize OutDiscards

Fa1/36 0 0 0 0

0 2468

Port Single-Col Multi-Col Late-Col Excess-Col

Carri-Sen Runts Giants

Fa1/36 0 0 0 0

0 0 0

Port SQETest-Err Deferred-Tx IntMacTx-Err IntMacRx-Err

Symbol-Err

Fa1/36 0 0 0 0

0

Interface FastEthernet1/36 queueing strategy: Weighted Round-Robin

Port QoS is enabled

Trust boundary disabled

Port is untrusted

Extend trust state: not trusted [COS = 0]

Default COS is 0

Queueing Mode In Tx direction: mode-cos

Transmit queues [type = 1p3q1t]:

Queue Id Scheduling Num of thresholds

-----------------------------------------

1 WRR 1

2 WRR 1

3 WRR 1

4 Priority 1

WRR bandwidth ratios: 100[queue 1] 150[queue 2] 200[queue 3]

queue random-detect-min-thresholds

----------------------------------

1 70[1]

2 70[1]

3 70[1]

queue random-detect-max-thresholds

----------------------------------

1 100[1]

2 100[1]

3 100[1]

WRED disabled queues:

queue thresh cos-map

---------------------------------------

1 1 0 1

2 1 2 3 4

3 1 6 7

4 1 5

Leo Laohoo · ‎04-24-2016

The output doesn't really help because it is taken from a Virtual Interface. I'm keen to know which PHYSICAL interface the output drops are coming from.

paul amaral · ‎04-25-2016

Leo I have pasted the physical interface stats, FastEthernet1/36, look at the original mesg.

thanks, p

Leo Laohoo · ‎04-25-2016

Output discards on a FastEthernet port??? What client is connected to this port?

paul amaral · ‎04-25-2016

yes on fast ether port. set for full/100. Right now there is a cisco catalyst set for full/100 with no errors. its an L2 switch with about 5 linux smtp servers.

I just can't figure out what is causing the output drops on the 6500 side.

paul

Leo Laohoo · ‎04-25-2016

The 6548 line cards are no match to the Linux servers. The 6548 line cards are not meant to "live" or "do" data centre work.

paul amaral · ‎04-25-2016

so what do you think is happening, because we have other servers/switches connected to the 6548 line card that sees no output drops.

also the interface is connected to another L2 switch which sees no output drops. The servers are not connected directly to the 6548 linecard.

Leo Laohoo · ‎04-25-2016

What I think is happening is each of the Linux server is overwhelming the notoriously shallow memory buffer of the 6548 line cards.

The solution to this are:

1. Apply QoS; or

2. 6748 line cards.

Unfortunately, QoS is not my strong suits. One of the Cisco VIPs, Joseph, is good at it and if he's not busy he normally lurks around and he'll chime in.

paul amaral · ‎04-26-2016

is there a way to see memory buffers on those cards? also shouldn't i see drops across all ports on that card? I'm only seeing it for some ports, the busiest ports.

The other thing is the L2 switch connected to it is an old cisco catalyst 2900 and there are no output drops. Could that line card be really that bad ?

Carlos Villagran · ‎04-26-2016

You can use show buffers input-interface -interface x/x- packet/header command. however, you may be hitting this bug if you are only seeing the drops increase but there is no actual network degradation:

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCdz02952/?reffering_site=dumpcr

Hope it helps, best regards!

JC

paul amaral · ‎04-26-2016

The buffer command will not show any packets buffer info, i have tried that. Also the line card i have runs software 12.2 so its not affected by that bug. It's just really strange to have the output counter increment for now reason, well none that i can see.

paul amaral · ‎05-06-2016

Leo, i have done research am im not convinced that the issue is what your described above and its a problem with the hardware queue, thus also why setting the software queue didnt make a difference. I have mls qos enabled and its using the hardware queues on the linecards.

my question is the 6548 shows 1088KB for a TX queue buffer while the 6748 shows 1.2MB do you think that ~ and extra 300KB will make a difference ?

WS-X6748-GE-TX

48 x GE TX

1Q8T

1P3Q8T

1.3Mb

166KB

1.2MB

WS-X6548-RJ-45

48 x 10/100

1P1Q0T

1P3Q1T

1.1MB

28KB

1088KB

Joseph W. Doherty · ‎05-09-2016

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages wha2tsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

and extra 300KB will make a difference ?

Possibly, although a bigger difference using a 6748 might be being running at gig rather that 100. (Does your 2900 support gig on a 2900 uplink port?)

Generally, unless you're bumping into some kind of bug, Cisco device queues drop packets when they overflow. When overall utilization appears low, microbursts are a common cause, although you say you believe this isn't happening. (How did you obtain the packet dump you used? If you used Cisco's SPAN, I'm unsure its replication would be accurate enough to represent what the egress port hardware is actually "seeing" frame/packet timing wise.)

As you mention you're using MLS QoS, it's possible, some tuning of the interface's QoS settings (e.g. wrr-queue queue-limit and/or wrr-queue bandwidth) might remediate your drops. (I don't recall how a 6548 shares buffers between its queues.)

also shouldn't i see drops across all ports on that card? I'm only seeing it for some ports, the busiest ports.

Since on this particular card, buffers are allocated per port, what you state makes sense, i.e. you only see drops on the busiest ports.

paul amaral · ‎05-09-2016

Joseph, it was set to 1000 using an HP switch before i switched it to the 2900 at 100. The thing is i never seeing input errors on the 2900 or HP switch. Do you think going with the WS-X6748-GE-TX would make a difference, im not 100% sure.

I used span and wireshark IO graph to look at the bandwidth used in real time. I was seeing output drops at 3 megs/5 megs and never saw any indication of microbursts. although I'm assuming I should be seeing spikes near the 100Meg limit which im not seeing anything near that.

I tweaked the WRR queue bandwidth, dont have queue limit option, several times with no affect, although i have mls qos enabled and dscp trust on that port most packets destined for this port are smtp traffic with low level qos marking and the packets never take advantage of the priority queue and are dropped on the 1st queue.

Packets dropped on Transmit:
    BPDU packets: 0
    que thr            dropped          30-s bytes          peak bytes 5-mins avg bps   peak bps [co
s-map]
    -------------------------------------------------------------------------------------------------
-----

    1    1                1256                   0                   0            0            0   [0
1 ]

thanks, paul

Joseph W. Doherty · ‎05-09-2016

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages wha2tsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Joseph, it was set to 1000 using an HP switch before i switched it to the 2900 at 100. The thing is i never seeing input errors on the 2900 or HP switch. Do you think going with the WS-X6748-GE-TX would make a difference, im not 100% sure.

I wouldn't expect egress drops to cause input error unless the receiving device couldn't deal with 100 Mbps.

Again, a 6748 might help, because of the extra 300K, because if you can run at gig rather than 100, because it should support the queue-limit command too.

What you might try, if you don't have a 6748 port at hand, is a sup port. The sup720's ports aren't the best for uplinks, but it might be interesting to see if your results vary.

I used span and wireshark IO graph to look at the bandwidth used in real time. I was seeing output drops at 3 megs/5 megs and never saw any indication of microbursts. although I'm assuming I should be seeing spikes near the 100Meg limit which im not seeing anything near that.

Again, unsure SPAN doesn't distort timings. What port were you SPANing, the egress port?

Could more than one port be sending traffic to the egress port? If so, consider just two concurrent 100 Mbps streams would be sending 200 Mbps of traffic to a 100 Mbps port. If that happens, the question is whether the concurrent traffic will exceed the allocated buffer space. If the whole MB of buffer space was available, one would expect it to often be ample, but if the interface reserves it for egress queues, and if you cannot adjust them, then you're more likely to overflow an individual queue's buffers.