Overrun Problem in 6509

cplanet · ‎08-13-2012

Hi,

Would you please help with this problem ?

I found increasing number of overrun in the tengigabit interface.

The hardware is 6509 with sup720-3B and the overrun ouput is shown in WS-X6716-10GE module.

The connected device to WS-X6716-10GE is EMC Islion S200.

I think the capability of WS-X6716-10GE can handle EMC NAS device and the load is not so busy.

So I have no idea of handling this and the reason why this overrun packet is shown.

The output of interface is like the followings,

Core_TECH_S720#sh interfaces tenGigabitEthernet 2/9

TenGigabitEthernet2/9 is up, line protocol is up (connected)

Hardware is C6k 10000Mb 802.3, address is f866.f220.1fa0 (bia f866.f220.1fa0)

MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec,

reliability 255/255, txload 1/255, rxload 1/255

Encapsulation ARPA, loopback not set

Keepalive set (10 sec)

Full-duplex, 10Gb/s

input flow-control is on, output flow-control is on

ARP type: ARPA, ARP Timeout 04:00:00

Last input never, output 00:00:42, output hang never

Last clearing of "show interface" counters 1d19h

Input queue: 0/2000/28831/0 (size/max/drops/flushes); Total output drops: 0

Queueing strategy: fifo

Output queue: 0/40 (size/max)

5 minute input rate 0 bits/sec, 0 packets/sec

5 minute output rate 0 bits/sec, 0 packets/sec

2401171227 packets input, 3567597556388 bytes, 0 no buffer

Received 603 broadcasts (297 multicasts)

0 runts, 0 giants, 0 throttles

0 input errors, 0 CRC, 0 frame, 28831 overrun, 0 ignored

0 watchdog, 0 multicast, 0 pause input

0 input packets with dribble condition detected

244519553 packets output, 20648786622 bytes, 0 underruns

0 output errors, 0 collisions, 2 interface resets

0 babbles, 0 late collision, 0 deferred

0 lost carrier, 0 no carrier, 2523 PAUSE output

0 output buffer failures, 0 output buffers swapped out

Would you please help with this issue ?

Thanks

Hong

nkarpysh · ‎08-13-2012

Hi Hong,

Overruns are not result of the problem with link/device connected to the port where you see overruns. They are actually a signal of oversubscription on some other port.

E.G.:

There are several ingress ports: Gi2/9, Gi3/9, Gi4/7 - just for example I took those numbers. And there is single outgoing port for those 3 - Gi 1/1.

If all 3 ports will be receiving traffic up to 1 Gig and try to send that out of Gi1/1 - they will cause oversubscription on Gi1/1. This port will start complaining to switch fabric about it. Fabric will recognise the ingress ports which cause this oversubscription and send flow control message towards them to slow down. Thus all these ports will slow down - but to do that they can only start buffering traffic on the line card to which this port belongs - Gi2/9 on LC2 (corresponding ASIC buffer), Gi3/9- LC3, Gi4/7 -LC4.

However as the traffic still comes in from the ingress port - the buffers are quickly filled and start dropping. Those drops you see as overruns.

Thus to troubleshoot it further - find the port which has output drops growing. Understand what traffic is going out of it. Consider to add another port to it and bundle in etherchannel to avoid oversubscription issues.

HTH

Nik

HTH,
Niko

R. Van Valkenburgh · ‎11-07-2012

The 6716 blade is a highly oversubscribed card. if I remember correctly, it is 4:1 overubscribed. What that means is that there are four ports on a single ASIC connecting back to the fabric. The overruns are due to the other ports on the same ASIC preventing this port from being able to send all of its traffic to the fabric.

You may be able to reorganize what is plugged into which port, and perhaps balance the ASICs more evenly.

In the end, though, you need to make sure that the aggregate total of this port and the other three on the same ASIC never reach (or exceed) 10G. QoS *input* policies on all four of those ports might give you some opportunity to drop low priority traffic to protect the NAS critical traffic.

You could also consider a 6708 card, which is only 2:1 oversubscribed.

Or you could consider a 6704, which is 1:1 and thus not oversubscribed at all, and would be able to carry whatever the NAS sent.

The 6716 blade is probably not a good choice for a NAS, assuming has high peaks of traffic. The 4:1 oversubscription is probably more suitable for blade servers which tend to infrequently hit their peaks.

cplanet · ‎11-07-2012

Hello R. Van Valkenburgh,

Thank you for your kind advice on this isuue It was very helpful and I appreciate your comment.

Have a good day.

Best Regards,

Hong-sung, So