cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1218
Views
5
Helpful
14
Replies

Ethernet output overruns

us10610
Level 4
Level 4

Could someone explain what happens to a network flow that encounters an output overrun?

I had a report of a file copy between 2 servers in the same 6509 chassis was slow over the weekend. It took 14 hours to copy where it usually takes 4. I see that there are 32 overruns with 10 queue drops. It looks like this is a case where the NIC on the server couldn't keep up but what effect does that have on the overall network performance? Does flow control start chopping the window size? Is 32 a high number?

Thanks in advance,

Greg

14 Replies 14

balajitvk
Level 4
Level 4

Hi,

Do see the interface speed between servers and switch ports.... there might be a mismatch it seems....

No.in Overruns or Output errors shows Number of times that the receiver hardware was unable to hand received data to a hardware buffer because the input rate exceeded the receiver's ability to handle the data.

Rgs,

Jagdeep Gambhir
Level 10
Level 10

If the capacity of the interface is exceeded, the frame that is currently being received

is dropped and the overrun counter is incremented.

Each network interface on Cisco routers consists of a chipset for converting signals

received in the media into bits of information, and a small packet buffer into which this information is stored before being copied into the I/O memory. On some interface types, this chipset and packet buffer cannot handle a long burst of frames. Such interfaces are meant to provide connectivity to a certain network type, and not to switch packets at line rate. The line rate of these interfaces is often higher than the switching capacity of the router. Therefore, building an interface that receives more traffic than the router can

handle only increases the cost, without adding any real value to the router architecture.

In a small number of cases, the overrun counter may be incremented because of a software defect. However, in the majority of cases, it indicates that the receiving capability of the interface was exceeded. Nothing can be done on the router that reports overruns. If possible, the rate that frames are coming should be controlled at the remote end of the connection. Otherwise, if the number of overruns is high, the hardware should be upgraded.

There are a number of troubleshooting steps you can take to find out what traffic is being processed and backing up the input queue. The section "Input Queue Drops" in the document linked below should give all the steps needed to find this.

http://www.cisco.com/en/US/products/hw/routers/ps133/products_tech_note09186

a0080094791.shtml

However, the input queue drops may just be a result of the same bursty traffic that causes

the overruns. Some traffic by nature just has to be processed through the input queue,

and at those times it may be dropped.

Regards,

~JG

Does that mean that the port input queue received more data than the output queue can handle and has nothing to do with the servers NIC?

We're using 6748-GE-TX line cards so I am surprised to think that the card can't keep up. The 2 servers are in different line cards but are in the same chassis.

Thanks again.

Greg

Let's see the output from typing

show mod

and

show fabric

Thanks!! I attached the output as an attachment because I exceeded the max message length.

Your config is ideal as you are running in Compact mode and every single module is fabric.

Have you checked for speed/duplex mismatch between the switch and the servers ?

Is this a new configuration or was it working before w/o problems ?

Both ports are configured as 1000/Full. The config is pretty new, about 4 weeks old. This data transfer is happening every weekend so I am trying to verify if this has happened recently or if this is the 1st time.

I'm trying to make sure that I understand though before sending another email to the Data Center Team. I think I misinformed them earlier because I thought the output overruns were because the NIC on the server couldn't handle anymore packets. I am now under the impression that the overruns are where the switchport received packets faster on the ingress port than what could be sent on the egress port. Is this correct?

Can I see the output from typing

show int [mod/port] stat

show int [mod/port] status

from the affected ports

and

show version

You bet! (I'm only seeing overruns on gi7/48)

sh int gi3/48 stat

GigabitEthernet3/48

Switching path Pkts In Chars In Pkts Out Chars Out

Processor 0 0 43131 19150099

Route cache 0 0 0 0

Distributed cache 0 0 0 0

Total 0 0 43131 19150099

m-server-1#sh int gi3/48 status

Port Name Status Vlan Duplex Speed Type

Gi3/48 connected 157 a-full a-1000 10/100/1000BaseT

m-server-1#

m-server-1#

m-server-1#sh int gi7/48 stat

GigabitEthernet7/48

Switching path Pkts In Chars In Pkts Out Chars Out

Processor 0 0 43092 19132728

Route cache 0 0 0 0

Distributed cache 0 0 0 0

Total 0 0 43092 19132728

m-server-1#sh int gi7/48 status

Port Name Status Vlan Duplex Speed Type

Gi7/48 connected 157 a-full a-1000 10/100/1000BaseT

m-server-1#sh ver

Cisco Internetwork Operating System Software

IOS (tm) s72033_rp Software (s72033_rp-IPSERVICESK9_WAN-VM), Version 12.2(18)SXF6, RELEASE SOFTWARE (fc1)

Technical Support: http://www.cisco.com/techsupport

Copyright (c) 1986-2006 by cisco Systems, Inc.

Compiled Tue 19-Sep-06 01:38 by tinhuang

Image text-base: 0x01020150, data-base: 0x01021000

ROM: System Bootstrap, Version 12.2(17r)S4, RELEASE SOFTWARE (fc1)

BOOTLDR:

m-server-1 uptime is 4 weeks, 2 days, 1 hour, 46 minutes

Time since m-server-1 switched to active is 4 weeks, 2 days, 1 hour, 46 minutes

System returned to ROM by reload (SP by reload)

System restarted at 12:55:32 EDT Sat Jun 30 2007

System image file is "disk0:/sys/s72033/base/s72033-ipservicesk9_wan-vm"

This product contains cryptographic features and is subject to United

States and local country laws governing import, export, transfer and

use. Delivery of Cisco cryptographic products does not imply

third-party authority to import, export, distribute or use encryption.

Importers, exporters, distributors and users are responsible for

compliance with U.S. and local country laws. By using this product you

agree to comply with applicable laws and regulations. If you are unable

to comply with U.S. and local laws, return this product immediately.

A summary of U.S. laws governing Cisco cryptographic products may be found at:

http://www.cisco.com/wwl/export/crypto/tool/stqrg.html

If you require further assistance please contact us by sending email to

export@cisco.com.

cisco WS-C6509-E (R7000) processor (revision 1.2) with 491520K/32768K bytes of memory.

Processor board ID SMG1044N3D1

SR71000 CPU at 600Mhz, Implementation 1284, Rev 1.2, 512KB L2 Cache

Last reset from s/w reset

Bridging software.

X.25 software, Version 3.0.0.

SuperLAT software (copyright 1990 by Meridian Technology Corp).

TN3270 Emulation software.

14 Virtual Ethernet/IEEE 802.3 interfaces

196 Gigabit Ethernet/IEEE 802.3 interfaces

4 Ten Gigabit Ethernet/IEEE 802.3 interfaces

1917K bytes of non-volatile configuration memory.

Configuration register is 0x2102

System is currently running from installed software

For further information use "show install running"

Everything looks good at the switch. Since you mentioned it's only happening on one port and the config on both ports are identical, I recommend looking into Layer 1 between the switch and server. Replace cables as needed and investigate if this was just one case or if it is a chronic problem.

Thanks for your help.

Just to clarify, Are output overruns caused by the ingress queue sending packets faster than the output queue hardware buffer can handle?

My 1st thought was that the server NIC hardware buffer was full but then I read that the counter was only on the port. If this is so, how would layer 1 affect the performance of the egress queue?

Sorry for such a lame question.

Thanks,

Greg

These are packets leaving the switch so they are placed in the egress queue.

It sounds like the packets leaving the switch were too big or there were too many to fit in some queues hence the overrun condition which in turn caused some queue drops.

It seems the source server was sending packets faster than the destination.

Were both servers running under the same CPU/Load condition when the transfer was taking place ?

The destination server is a shared resource and most likely was being accessed by end users during the transfer.

Would this explain the overrun?

Thanks,

Greg

Yes.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: