cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2242
Views
0
Helpful
13
Replies

Cisco 3850 output drop on 10GE interface

satish.txt1
Level 1
Level 1

I have Cisco WS-C3850-48T switch which is connected to Cisco ASR1006 with 10GE interface and i am seeing lots of packet drop on output interface when we hit 7Gbps traffic on it. 

 

#sh int te1/1/4
TenGigabitEthernet1/1/4 is up, line protocol is up (connected)
  Hardware is Ten Gigabit Ethernet, address is 6c99.8962.38f4 (bia 6c99.8962.38f4)
  Description: Connected-to-ASR1006-10GE
  Internet address is 66.xx.xx.120/31
  MTU 1500 bytes, BW 10000000 Kbit/sec, DLY 10 usec,
     reliability 242/255, txload 190/255, rxload 126/255
  Encapsulation ARPA, loopback not set
  Keepalive not set
  Full-duplex, 10Gb/s, link type is auto, media type is SFP-10GBase-LR
  input flow-control is off, output flow-control is unsupported
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:00, output never, output hang never
  Last clearing of "show interface" counters 00:00:08
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 3365165
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 4955816000 bits/sec, 4491155 packets/sec
  5 minute output rate 7477101000 bits/sec, 5195199 packets/sec
     45558899 packets input, 6282134279 bytes, 0 no buffer
     Received 1 broadcasts (0 IP multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 1 multicast, 0 pause input
     0 input packets with dribble condition detected
     52692763 packets output, 9499077983 bytes, 0 underruns
     3365165 output errors, 0 collisions, 0 interface resets
     0 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out

We are running stack and here is the version

 

Switch Ports Model              SW Version        SW Image              Mode
------ ----- -----              ----------        ----------            ----
*    1 56    WS-C3850-48T       03.06.04.E        cat3k_caa-universalk9 INSTALL
     2 56    WS-C3850-48T       03.06.04.E        cat3k_caa-universalk9 INSTALL

 

We are not running any QoS or anything its just deafult:

 

#show platform qos queue config tenGigabitEthernet 1/1/4
DATA Port:1 GPN:56 AFD:Disabled QoSMap:0 HW Queues: 8 - 15
  DrainFast:Disabled PortSoftStart:1 - 1080
----------------------------------------------------------
  DTS Hardmax   Softmax  PortSMin GlblSMin  PortStEnd
  --- --------  -------- -------- --------- ---------
 0   1  5   120  6   480  6   320   0     0   3  1440
 1   1  4     0  7   720  3   480   2   180   3  1440
 2   1  4     0  5     0  5     0   0     0   3  1440
 3   1  4     0  5     0  5     0   0     0   3  1440
 4   1  4     0  5     0  5     0   0     0   3  1440
 5   1  4     0  5     0  5     0   0     0   3  1440
 6   1  4     0  5     0  5     0   0     0   3  1440
 7   1  4     0  5     0  5     0   0     0   3  1440
 Priority   Shaped/shared   weight  shaping_step
 --------   ------------   ------  ------------
 0      0     Shared            50           0
 1      0     Shared            75           0
 2      0     Shared         10000          43
 3      0     Shared         10000           0
 4      0     Shared         10000           0
 5      0     Shared         10000           0
 6      0     Shared         10000         128
 7      0     Shared         10000           0

   Weight0 Max_Th0 Min_Th0 Weigth1 Max_Th1 Min_Th1 Weight2 Max_Th2 Min_Th2
   ------- -------  ------  ------  ------  ------  ------  ------ ------
 0      0     478       0       0     534       0       0     600       0
 1      0     573       0       0     641       0       0     720       0
 2      0       0       0       0       0       0       0       0       0
 3      0       0       0       0       0       0       0       0       0
 4      0       0       0       0       0       0       0       0       0
 5      0       0       0       0       0       0       0       0       0
 6      0       0       0       0       0       0       0       0       0
 7      0       0       0       0       0       0       0       0       0

 

I was reading this article but not sure is this the real issue or something else: https://www.cisco.com/c/en/us/support/docs/switches/catalyst-3850-series-switches/200594-Catalyst-3850-Troubleshooting-Output-dr.html

 

Any suggestion?

13 Replies 13

Can you check the cable connection between the ends ?

 

Check if there is any kind of interference.

They both device sitting in same rack and cable distance is 2 meter, Interference in fiber optic cable i never heard (its light and no one can bend light except black hole )

 

One or interesting thing why i am only seeing drop during peak traffic which is 7Gbps, when traffic is 4 or 5 Gbps not seeing too many drops, very few or some time none. 

 

If it is hardware issue  then i should see every time no matter how much traffic flowing over. 

Currently the load interval is showing the 5 minute average. Can you adjust your load interval to 30 on the 3850? This will give you a more accurate reading of the output. Perhaps you are spiking or having micro bursts that reach the 10G threshold.

Also, I noticed you are using LR optics for a 2 meter connection. It's doubtful that this has any bearing, but check the light levels on either side to be sure. "sh interface t1/1/4 transciever details"

I have set 30 second load-interval and seeing drops when traffic going above 7Gbps.

here is the transceiver details

#sh interface t1/1/4 transceiver detail
ITU Channel not available (Wavelength not available),
Transceiver is internally calibrated.
mA: milliamperes, dBm: decibels (milliwatts), NA or N/A: not applicable.
++ : high alarm, +  : high warning, -  : low warning, -- : low alarm.
A2D readouts (if they differ), are reported in parentheses.
The threshold values are calibrated.

                              High Alarm  High Warn  Low Warn   Low Alarm
          Temperature         Threshold   Threshold  Threshold  Threshold
Port       (Celsius)          (Celsius)   (Celsius)  (Celsius)  (Celsius)
--------- ------------------  ----------  ---------  ---------  ---------
Te1/1/4     30.1                75.0        70.0         0.0       -5.0

                              High Alarm  High Warn  Low Warn   Low Alarm
           Voltage            Threshold   Threshold  Threshold  Threshold
Port       (Volts)            (Volts)     (Volts)    (Volts)    (Volts)
---------  ---------------    ----------  ---------  ---------  ---------
Te1/1/4    3.27                  3.63        3.46        3.13       2.97

           Optical            High Alarm  High Warn  Low Warn   Low Alarm
           Transmit Power     Threshold   Threshold  Threshold  Threshold
Port       (dBm)              (dBm)       (dBm)      (dBm)      (dBm)
---------  -----------------  ----------  ---------  ---------  ---------
Te1/1/4     -1.3                 3.4         0.4        -8.1      -12.1

           Optical            High Alarm  High Warn  Low Warn   Low Alarm
           Receive Power      Threshold   Threshold  Threshold  Threshold
Port       (dBm)              (dBm)       (dBm)      (dBm)      (dBm)
-------    -----------------  ----------  ---------  ---------  ---------
Te1/1/4     -1.8                 3.4         0.4       -14.4      -18.3

Leo Laohoo
Hall of Fame
Hall of Fame
reliability 242/255

The issue is caused by a fibre optic cable issue.  The value should always be 255/255.

3365165 output errors, 0 collisions, 0 interface resets

Output drops are actually output errors.  

Can we see the "sh interface <PORT>" of the other end?

I thought reliability is overall performance of interface not related fiber optic or interface hardware.

 

Here is the second end interface output which is Cisco ASR1006

 

asr1k#sh int TenGigabitEthernet0/0/0
TenGigabitEthernet0/0/0 is up, line protocol is up
  Hardware is SPA-1X10GE-L-V2, address is f4cf.e2ed.8700 (bia f4cf.e2ed.8700)
  Description: ***** Cisco 3850 - Te1/1/4 *****
  Internet address is 66.151.237.121/31
  MTU 1500 bytes, BW 10000000 Kbit/sec, DLY 10 usec,
     reliability 255/255, txload 125/255, rxload 168/255
  Encapsulation ARPA, loopback not set
  Keepalive not supported
  Full Duplex, 10000Mbps, link type is force-up, media type is 10GBase-LR
  output flow-control is on, input flow-control is on
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:01, output 00:04:13, output hang never
  Last clearing of "show interface" counters 00:00:21
  Input queue: 0/375/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 6623444000 bits/sec, 4721047 packets/sec
  5 minute output rate 4920536000 bits/sec, 4150917 packets/sec
     109837325 packets input, 18694537131 bytes, 0 no buffer
     Received 0 broadcasts (0 IP multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 9 multicast, 0 pause input
     116161127 packets output, 27931574509 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     1 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out

This what i found in google:

 

  • reliability 255/255: When the input and output errors increase, they affect the reliability counter. This indicates how likely it is that a packet can be delivered or received succesfully. Reliability is calculated like this: reliability = number of packets / number of total frames. The value of 255 is the highest value meaning that the interface is very reliable at the moment.  The calculation above is done every 5 minutes.

The fault is nearest the 3850.  It's either the cable or the optical module.  

why i am not seeing packet drops when traffic is low under >5Gbps ?

I am only seeing drops when traffic go above <7Gbps

I am here to help identify what the issue is & where the cause of the problem could potentially be.
Let us know if replacing the fibre optic patch cable has made some improvement(s) or not.

satish.txt1
Level 1
Level 1

now traffic is below 7Gbps on link and i am seeing zero packet loss last 1 hour, I think it's not hardware or cable issue, it seems to me oversubscribed interface issue. I think i need big pipe here.  

 

#sh int tenGigabitEthernet 1/1/4
TenGigabitEthernet1/1/4 is up, line protocol is up (connected)
  Hardware is Ten Gigabit Ethernet, address is 6c99.8962.38f4 (bia 6c99.8962.38f4)
  Description: rtr5a1-Te0/0/0
  Internet address is 66.151.237.120/31
  MTU 1500 bytes, BW 10000000 Kbit/sec, DLY 10 usec,
     reliability 255/255, txload 144/255, rxload 90/255
  Encapsulation ARPA, loopback not set
  Keepalive not set
  Full-duplex, 10Gb/s, link type is auto, media type is SFP-10GBase-LR
  input flow-control is off, output flow-control is unsupported
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:04, output never, output hang never
  Last clearing of "show interface" counters 00:00:15
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)

Joseph W. Doherty
Hall of Fame
Hall of Fame
If you're breaking 70% utilization over 30 seconds, then it's very possible you're seeing congestion discards.

Buffer tuning may decrease your discard rate although with the possible corresponding issue of increasing latency. Much also depends if it's transient or long term congestion.

Another approach is to adopt "smarter" flow and/or drop management, but most LAN switches tend to be feature poor to do so.

You are right, whole day i am getting zero packet loss but as soon as we hit +7Gbps i start getting discard or packet loss, Cisco has good article about tuning buffer but i think they also suggest it may have bad impact too, so better increase your pipe or spit traffic using port-channel. I think in 10GE pipe 70% should be cap because there must be some spike or anything which you can't see so keeping 30% buffer is good, we shouldn't go 100% utilization on 10GE,

Generally the "bad impact" of tuning buffers, if done "wrong" you can make things worse. If done "right", and if truly suitable for what's needed, it can also make a huge positive impact.

If you in a position where you could easily add a port-channel, that's probably a less "risky" approach.

Where QoS and/or tweaking buffers becomes more germane is when increasing bandwidth also dramatically increases cost.