Total output drops on a WS-C3750E-48TD

Unanswered Question
Mar 17th, 2010

Hello,

I have a potential problem with reported total output drops on a WS-C3750E-48TD switch interface. The interface itself is a 1000BaseSX SFP which is plugged into a TwinGig converter module. The switch is switch number one in a stack of two switches. Switch one is configured with a priority of 15, but switch two has been elected master (switch 2 has a priority of 14). I'm not sure how this happened, or even if it's relevant, but I thought I would mention it.

If I continually and repeatedly enter 'sho int gi1/0/49' the reported number of total output drops changes from 5 to 142058, and those two values are constant. One second, there are very few output drops, and then next second there are thousands.

GigabitEthernet1/0/49 is up, line protocol is up (connected)
  Hardware is Gigabit Ethernet, address is 001f.9ea9.1931 (bia 001f.9ea9.1931)
  Description: uksltc03dz01sw01
  MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
     reliability 255/255, txload 10/255, rxload 8/255
  Encapsulation ARPA, loopback not set
  Keepalive not set
  Full-duplex, 1000Mb/s, link type is auto, media type is 1000BaseSX SFP
  Media-type configured as  connector
  input flow-control is off, output flow-control is unsupported
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:07, output 00:00:29, output hang never
  Last clearing of "show interface" counters 00:45:36
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 142058
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 33765000 bits/sec, 7129 packets/sec
  5 minute output rate 40237000 bits/sec, 7179 packets/sec
     23039908 packets input, 14014206433 bytes, 0 no buffer
     Received 5200773 broadcasts (5200280 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 5200280 multicast, 0 pause input
     0 input packets with dribble condition detected
     23384804 packets output, 16939912008 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out

The interface is on a switch in a DMZ and has maybe 50 servers downstream of it. It's the main trunk to the upstream switch on the way to the firewall. I've checked the bandwidth usage on the interface and it's nowhere near fully utilised (Perhaps somewhere between 150-200 Mb/s Tx/Rx aggregated)

First of all, we swapped out the GBIC, but that made no difference, and since then I did little more reading, and learned that a problem with high output drops won't be fixed by swapping out the GBIC.

I submitted the output from the 'sho int' command to the Cisco Output Interpreter, which recommended a couple of things:

1) Turn off fast switching for heavily used protocols. For example, turn off IP fast switching by using the 'no ip route-cache' interface configuration
     command. ------ This command is not available on that interface, so I can't do that.

2) Submit the output from 'show buffers' to Output Interpreter to determine if buffers need to be tuned.----- This told me that I the switch had

ERROR: Since it's last reload, this router has created or maintained a relatively
large number of 'Big buffers' yet still has very few free buffers.

ERROR: Since it's last reload, this router has created or maintained a relatively
large number of 'VeryBig buffers' yet still has very few free buffers.

ERROR: Since it's last reload, this router has created or maintained a relatively
large number of 'FastEthernet0-Physical buffers' yet still has very few free buffers.

If this was related to a memory leak or something similar, I would expect to perhaps see similary problems on other trunk interfaces on other 3750 switches running the same favour of IOS (12.2(44)SE2), but I don't.

This problem was first noticed by users of an FTP server connected to the switch stack. They're reporting slow performance and intermittent connectivity problems when attempting to initiate a connection.

Can anyone make any suggestions as to what I can try?

Thanks,

Olly

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (1 ratings)
Loading.
Pronoy Dasgupta Sun, 03/21/2010 - 09:22

Hey Olly,

I dont think the output drops should be held responsible for the slow connections. It might be because of the following known issue:

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCso81660

You would need to upgrade your IOS to get past the above. Also check out the "show platform port-asic stats drop "

If you see drops increasing here in any of the queues, then it should be a concern for you.

Thanks

Pronoy

Oliver Drew Tue, 03/23/2010 - 05:41

Thanks for your reply.

Before you replied, I shut down Int Gi1/0/49, to force traffic through an alternate interface (Gi 2/0/49), so see if that would improve things. It didn't make any difference in terms of the performance problem being experienced unfortunately. I also saw the same output drops behaviour (jumping from a small number to a higher number and back again).

So, Gi 1/0/49 is enabled once again, and traffic is flowing through it. When I issue "show platform port-asic stats drop gigabitEthernet 1/0/49", the switch returns "%Command Rejected: interface 'GigabitEthernet1/0/49' is not local port". It also does this for Gi 1/0/50. The command works fine on Gi 2/0/49 and Gi 2/0/50. Could this mean that I have a problem with the TwinGig Converter Module in which the two SFP modules are inserted?

Thanks again - Olly

P.S. I will take your advice and plan to upgrade to a newer version of IOS as soon as possible.

Pronoy Dasgupta Tue, 03/23/2010 - 08:38

Hello Olly,

Well the command is being rejected since the switch port for which you are trying to use this command is not local, meaning it is a slave switch. What you need to do to use this command is to session into switch 1 (per the output switch 2 seems to be the master of this stack, and hence the commands run easily for those interfaces). The way you can do that is by using "session 1" command and then using the same command of show platform...

Thanks

Pronoy

Oliver Drew Tue, 03/23/2010 - 11:12

Hi Pronoy,

Thanks again. Using the session 1 command connects me directly to the member switch.

I can confirm that the show platform port-asic stats drop gigabitEthernet 1/0/49 displays a constant value for the drops whilst the output from a 'show int' changes frequently.

I will schedule a maintenance window and upgrade. I'll post back when it's done to let you know how things look.

Thanks,

Olly

patoberli Mon, 01/16/2012 - 01:37

Hi

I just found this thread because we have a similar issue.

It's a stack consisting of two WS-C3750E-24TD with software release 12.2(58)SE1 and I have drops on one interface.
Here the results of the show command:

show plat port-as stats drop g1/0/26

  Interface Gi1/0/26 TxQueue Drop Statistics
    Queue 0
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 0
    Queue 1
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 0
    Queue 2
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 0
    Queue 3
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 3529272
    Queue 4
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 0
    Queue 5
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 0
    Queue 6
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 0
    Queue 7
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 0

This number is increasing on fairly high (600-700Mbit/s) load on this interface. Also the Total Output Queue Drops counter is increasing for that interface.

Any ideas what the issue could be?

Thanks 

nkarpysh Mon, 01/16/2012 - 02:54

Hello,

In your case drops seems tobe due to bursty traffic or over-subscription. You said you see those when utilization close to 600-700MB. That is avaerage value so there is a chance of spike up to 1GB. So buffers can be easily utilized. Same can be noticed even with low rate with bursty traffic. When sudden momentary burst of hundreds packets coming - buffers are quickly exausted and you would see same drops.

The DDTS above is applicable only to "show interface" command which sometimes may show incorrect value. "show platform" output show correct results.

You may want to consider link upgrade (build port-channel) or configured QoS to split traffic in several queues and also add additional buffer to important queues increasing queue thresholds.

Hope this helps,

Nik

patoberli Mon, 01/16/2012 - 03:52

Hi Nik

Thanks for your reply. This does not absolutely make me happy, as I expected that the 3750G platform to be able to sustain a stable 900Mbit/s on an interface. That affected interface is in a etherchannel with a second interface, but it was some "normal" backup-job traffic. Luckily we will soon upgrade most of those switches to Nexus, but it still doesn't make me completely happy. Would here some flow-control maybe help? Or is this anyway not really a problem?

It's just that one of the server admins is complaining about slow traffic on his ESX servers and we don't really know why.

Joseph W. Doherty Mon, 01/16/2012 - 04:56

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

A 3750G, unlike a 3750E, depending on model, can have insufficient fabric and/or pps forwarding capacity.  Further its original Stackwise doesn't use the stack ring as optimally as Stackwise+.  This all noted, like Nik, I would first suspect bursty traffic is the issue.  Your stats show all the drops in just one queue and at just one weight. If traffic is TCP based, it will slow itself when it detects drops and 60 to 70% utilization is not outside the bounds of useful utilization while encountering drops.

You mention you have a Etherchannel, but since individual flows only utilize a single link, an individual heavy bandwidth consuming flow, like the backup (?), could still easily overrun available bandwidth.

You've asked whether flow control might help, and it might, but for normal Ethernet you need to be very careful in its application as all traffic on a port is suspended.  Depending on your topology, a server with a dedicated NIC for backup could benefit with flow control on the port to that NIC provided the uplink congestion is on the same switch.

Other options include tuning your buffers or tuning the heavy bandwidth receiving hosts with a smaller TCP RWIN.  Depending on other traffic, additional links in Etherchannel might help (you also might want to review the hashing algorithm is optimal for you too).  A 10gig uplink is another option.

patoberli Mon, 01/16/2012 - 05:31

Thanks for the info.

What I just discovered, the other end of the cable (a Catalyst 6500E) also shows output drops. But there I don't know the command for the platform statistics, as that one doesn't exist.

6509R-1250#sh int g3/15
GigabitEthernet3/15 is up, line protocol is up (connected)
  Hardware is C6k 1000Mb 802.3, address is 000b.462d.368e (bia 000b.462d.368e)
  Description: ** S3724G-2U01A-4 gig1/0/25 **
  MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
     reliability 255/255, txload 15/255, rxload 3/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s, media type is SX
  input flow-control is off, output flow-control is off
  Clock mode is auto
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:11, output 00:00:02, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 1288
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 11965000 bits/sec, 6200 packets/sec
  5 minute output rate 62120000 bits/sec, 6088 packets/sec
     17451515214 packets input, 18915079346152 bytes, 0 no buffer
     Received 40841337 broadcasts (4052338 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 0 multicast, 0 pause input
     0 input packets with dribble condition detected
     14312209777 packets output, 12476712248122 bytes, 0 underruns
     0 output errors, 0 collisions, 3 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out
6509R-1250#sh int g3/16
GigabitEthernet3/16 is up, line protocol is up (connected)
  Hardware is C6k 1000Mb 802.3, address is 000b.462d.368f (bia 000b.462d.368f)
  Description: ** S3724G-2U01A-4 gig1/0/26 **
  MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
     reliability 255/255, txload 34/255, rxload 3/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s, media type is SX
  input flow-control is off, output flow-control is off
  Clock mode is auto
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:15, output 00:00:39, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 46510
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 12981000 bits/sec, 4600 packets/sec
  5 minute output rate 133964000 bits/sec, 11819 packets/sec
     119080352142 packets input, 172960867483510 bytes, 0 no buffer
     Received 66085579 broadcasts (14509584 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 0 multicast, 0 pause input
     0 input packets with dribble condition detected
     70242487355 packets output, 19097647138927 bytes, 0 underruns
     0 output errors, 0 collisions, 3 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out

6509R-1250#sh int po48
Port-channel48 is up, line protocol is up (connected)
  Hardware is EtherChannel, address is 000b.462d.368f (bia 000b.462d.368e)
  Description: ** S3712G-2U01A-4 **
  MTU 1500 bytes, BW 2000000 Kbit, DLY 10 usec,
     reliability 255/255, txload 25/255, rxload 3/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s, media type is unknown
  input flow-control is off, output flow-control is off
  Members in this channel: Gi3/15 Gi3/16
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input never, output never, output hang never
  Last clearing of "show interface" counters 04:04:16
  Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 179
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 24809000 bits/sec, 10794 packets/sec
  5 minute output rate 197332000 bits/sec, 18009 packets/sec
     198210564 packets input, 162088342046 bytes, 0 no buffer
     Received 95561 broadcasts (10788 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 0 multicast, 0 pause input
     0 input packets with dribble condition detected
     225127684 packets output, 244383514438 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out

The interface card is a fairly old WS-X6416-GBIC.

Could this still be bursty traffic, or are we maybe facing a physical issue (cabling, SFP, ...)?

Joseph W. Doherty Mon, 01/16/2012 - 05:46

Disclaimer

The  Author of this posting offers the information contained within this  posting without consideration and with the reader's understanding that  there's no implied or expressed suitability or fitness for any purpose.  Information provided is for informational purposes only and should not  be construed as rendering professional advice of any kind. Usage of this  posting's information is solely at reader's own risk.

Liability Disclaimer

In  no event shall Author be liable for any damages whatsoever (including,  without limitation, damages for loss of use, data or profit) arising out  of the use or inability to use the posting's information even if Author  has been advised of the possibility of such damage.

Posting

Probably still bursty traffic.

I'm assuming the two interfaces shown are part of PO48, and if they are, note the difference in drops between the two interfaces and notice the difference in their in/out packet counts, especially out.

PS:

For 6500 line cards, often multiple ports share ASIC and other resources.  I.e. you can sometimes optimize performance by how busy vs. non-busy ports are connected to the card.

patoberli Mon, 01/16/2012 - 06:22

Thanks for your information.

I guess I can't do much in that case, besides hoping that the users don't feel it, until we made our planed upgrade to 10Gig.

Actions

This Discussion