07-30-2007 07:16 AM - edited 03-05-2019 05:34 PM
Could someone explain what happens to a network flow that encounters an output overrun?
I had a report of a file copy between 2 servers in the same 6509 chassis was slow over the weekend. It took 14 hours to copy where it usually takes 4. I see that there are 32 overruns with 10 queue drops. It looks like this is a case where the NIC on the server couldn't keep up but what effect does that have on the overall network performance? Does flow control start chopping the window size? Is 32 a high number?
Thanks in advance,
Greg
07-30-2007 07:30 AM
Hi,
Do see the interface speed between servers and switch ports.... there might be a mismatch it seems....
No.in Overruns or Output errors shows Number of times that the receiver hardware was unable to hand received data to a hardware buffer because the input rate exceeded the receiver's ability to handle the data.
Rgs,
07-30-2007 07:31 AM
If the capacity of the interface is exceeded, the frame that is currently being received
is dropped and the overrun counter is incremented.
Each network interface on Cisco routers consists of a chipset for converting signals
received in the media into bits of information, and a small packet buffer into which this information is stored before being copied into the I/O memory. On some interface types, this chipset and packet buffer cannot handle a long burst of frames. Such interfaces are meant to provide connectivity to a certain network type, and not to switch packets at line rate. The line rate of these interfaces is often higher than the switching capacity of the router. Therefore, building an interface that receives more traffic than the router can
handle only increases the cost, without adding any real value to the router architecture.
In a small number of cases, the overrun counter may be incremented because of a software defect. However, in the majority of cases, it indicates that the receiving capability of the interface was exceeded. Nothing can be done on the router that reports overruns. If possible, the rate that frames are coming should be controlled at the remote end of the connection. Otherwise, if the number of overruns is high, the hardware should be upgraded.
There are a number of troubleshooting steps you can take to find out what traffic is being processed and backing up the input queue. The section "Input Queue Drops" in the document linked below should give all the steps needed to find this.
http://www.cisco.com/en/US/products/hw/routers/ps133/products_tech_note09186
a0080094791.shtml
However, the input queue drops may just be a result of the same bursty traffic that causes
the overruns. Some traffic by nature just has to be processed through the input queue,
and at those times it may be dropped.
Regards,
~JG
07-30-2007 07:51 AM
Does that mean that the port input queue received more data than the output queue can handle and has nothing to do with the servers NIC?
We're using 6748-GE-TX line cards so I am surprised to think that the card can't keep up. The 2 servers are in different line cards but are in the same chassis.
Thanks again.
Greg
07-30-2007 08:06 AM
Let's see the output from typing
show mod
and
show fabric
07-30-2007 08:12 AM
07-30-2007 08:36 AM
Your config is ideal as you are running in Compact mode and every single module is fabric.
Have you checked for speed/duplex mismatch between the switch and the servers ?
Is this a new configuration or was it working before w/o problems ?
07-30-2007 09:00 AM
Both ports are configured as 1000/Full. The config is pretty new, about 4 weeks old. This data transfer is happening every weekend so I am trying to verify if this has happened recently or if this is the 1st time.
I'm trying to make sure that I understand though before sending another email to the Data Center Team. I think I misinformed them earlier because I thought the output overruns were because the NIC on the server couldn't handle anymore packets. I am now under the impression that the overruns are where the switchport received packets faster on the ingress port than what could be sent on the egress port. Is this correct?
07-30-2007 10:34 AM
Can I see the output from typing
show int [mod/port] stat
show int [mod/port] status
from the affected ports
and
show version
07-30-2007 10:44 AM
You bet! (I'm only seeing overruns on gi7/48)
sh int gi3/48 stat
GigabitEthernet3/48
Switching path Pkts In Chars In Pkts Out Chars Out
Processor 0 0 43131 19150099
Route cache 0 0 0 0
Distributed cache 0 0 0 0
Total 0 0 43131 19150099
m-server-1#sh int gi3/48 status
Port Name Status Vlan Duplex Speed Type
Gi3/48 connected 157 a-full a-1000 10/100/1000BaseT
m-server-1#
m-server-1#
m-server-1#sh int gi7/48 stat
GigabitEthernet7/48
Switching path Pkts In Chars In Pkts Out Chars Out
Processor 0 0 43092 19132728
Route cache 0 0 0 0
Distributed cache 0 0 0 0
Total 0 0 43092 19132728
m-server-1#sh int gi7/48 status
Port Name Status Vlan Duplex Speed Type
Gi7/48 connected 157 a-full a-1000 10/100/1000BaseT
m-server-1#sh ver
Cisco Internetwork Operating System Software
IOS (tm) s72033_rp Software (s72033_rp-IPSERVICESK9_WAN-VM), Version 12.2(18)SXF6, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2006 by cisco Systems, Inc.
Compiled Tue 19-Sep-06 01:38 by tinhuang
Image text-base: 0x01020150, data-base: 0x01021000
ROM: System Bootstrap, Version 12.2(17r)S4, RELEASE SOFTWARE (fc1)
BOOTLDR:
m-server-1 uptime is 4 weeks, 2 days, 1 hour, 46 minutes
Time since m-server-1 switched to active is 4 weeks, 2 days, 1 hour, 46 minutes
System returned to ROM by reload (SP by reload)
System restarted at 12:55:32 EDT Sat Jun 30 2007
System image file is "disk0:/sys/s72033/base/s72033-ipservicesk9_wan-vm"
This product contains cryptographic features and is subject to United
States and local country laws governing import, export, transfer and
use. Delivery of Cisco cryptographic products does not imply
third-party authority to import, export, distribute or use encryption.
Importers, exporters, distributors and users are responsible for
compliance with U.S. and local country laws. By using this product you
agree to comply with applicable laws and regulations. If you are unable
to comply with U.S. and local laws, return this product immediately.
A summary of U.S. laws governing Cisco cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html
If you require further assistance please contact us by sending email to
cisco WS-C6509-E (R7000) processor (revision 1.2) with 491520K/32768K bytes of memory.
Processor board ID SMG1044N3D1
SR71000 CPU at 600Mhz, Implementation 1284, Rev 1.2, 512KB L2 Cache
Last reset from s/w reset
Bridging software.
X.25 software, Version 3.0.0.
SuperLAT software (copyright 1990 by Meridian Technology Corp).
TN3270 Emulation software.
14 Virtual Ethernet/IEEE 802.3 interfaces
196 Gigabit Ethernet/IEEE 802.3 interfaces
4 Ten Gigabit Ethernet/IEEE 802.3 interfaces
1917K bytes of non-volatile configuration memory.
Configuration register is 0x2102
System is currently running from installed software
For further information use "show install running"
07-30-2007 12:28 PM
Everything looks good at the switch. Since you mentioned it's only happening on one port and the config on both ports are identical, I recommend looking into Layer 1 between the switch and server. Replace cables as needed and investigate if this was just one case or if it is a chronic problem.
07-30-2007 12:42 PM
Thanks for your help.
Just to clarify, Are output overruns caused by the ingress queue sending packets faster than the output queue hardware buffer can handle?
My 1st thought was that the server NIC hardware buffer was full but then I read that the counter was only on the port. If this is so, how would layer 1 affect the performance of the egress queue?
Sorry for such a lame question.
Thanks,
Greg
07-31-2007 07:16 AM
These are packets leaving the switch so they are placed in the egress queue.
It sounds like the packets leaving the switch were too big or there were too many to fit in some queues hence the overrun condition which in turn caused some queue drops.
It seems the source server was sending packets faster than the destination.
Were both servers running under the same CPU/Load condition when the transfer was taking place ?
08-01-2007 05:18 AM
The destination server is a shared resource and most likely was being accessed by end users during the transfer.
Would this explain the overrun?
Thanks,
Greg
08-01-2007 07:10 AM
Yes.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: