we have a 6509 with ios 12.2.33SXJ
we have 2 WS-X6516-GE-TX, WS-X6516A-GBIC, and a WS-X6748-GE-TX with a WS-F6700-CFC daughtercard
our sup is a WS-SUP720-3B
we are experiencing packet loss for everything connected in the WS-X6748-GE-TX blade, right now we dont have any production device in that blade due to the packet loss we are experiencing.
does anyone have encountered the same problem.
this switch was running hybrid before it is now running native ios, however I can't recall if we didn't have that packet loss before.
do i need to update a firmware of the card or daughtercard (if this is possible, can't say i've done it before).
I read the release note of 12.2sx
seems like the ROMMON on the WS-F6700-CFC daughtercard was not up to date. I updated it to 12.2(18r)S1 like the release note suggested. however it did not resolved my problem, i'm still experiencing packet lost for devices connected in this blade.
right now the blade is in slot 9 of our 6509. I could put it in blade 1, 2 or 3. would it change something?
The 6748 module has 2 x 20Gbps connections to the switch fabric. It has 48 10/100/1000Gbps ports. So in theory you can oversubscribe this module but it is unlikely as you would need over 40 ports, or more specifically more than 20 ports per port group to be transmitting 1Gbps simualtenously which is unlikely.
Just to clarify the port group thing. The 6748 as 2 port groups -
group1 = ports 1 - 24
group2 = ports 25 - 48
each port group has access to a 20Gbps connection to the switch fabric.
So if you have more than 20 connected devices per port group transmitting 1Gbps each simultaneously then you do have oversubscription. But as i say this is highly unlikely.
Moving the module to a different slot in the 6509 should make no difference as each each slot provides a maximum of 40Gbps per slot.
Is there any possibility you have enabled QOS but not tuned the buffers accordingly ? Where is the packet loss ie. ingress to the ports or egress from the ports ?
1. Post the "sh interface
2. Can you also post "sh interface
the qos could be the problem i guess, before there was the command: mls qos
while this command was on the switch we experienced packet loss and a delay for our ping,
then we disabled this command, but we still had packet loss but did not have delay anymore
is there a document that could help us configure the qos for this blade?
heres a show interface
we have the problem in all the port of the 6748
for the uptime, the 6500 was updated this weekend soo about 4 days.
GigabitEthernet9/1 is up, line protocol is up (connected)
Hardware is C6k 1000Mb 802.3, address is 0016.c810.75c0 (bia 0016.c810.75c0)
MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, media type is 10/100/1000BaseT
input flow-control is off, output flow-control is on
Clock mode is auto
ARP type: ARPA, ARP Timeout 04:00:00
Last input never, output 00:00:38, output hang never
Last clearing of "show interface" counters never
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 5
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 51000 bits/sec, 14 packets/sec
185714 packets input, 64272078 bytes, 0 no buffer
Received 2873 broadcasts (0 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 0 multicast, 0 pause input
0 input packets with dribble condition detected
4769209 packets output, 2225784348 bytes, 0 underruns
0 output errors, 0 collisions, 4 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 PAUSE output
0 output buffer failures, 0 output buffers swapped out
show interfaces gigabitEthernet 9/1 counters errors
Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards
Gi9/1 0 0 0 0 0 2
Port Single-Col Multi-Col Late-Col Excess-Col Carri-Sen Runts Giants
Gi9/1 0 0 0 0 0 0 0
Port SQETest-Err Deferred-Tx IntMacTx-Err IntMacRx-Err Symbol-Err
Gi9/1 0 0 0 0 0
Also seeing output drops, but mine are to a Linux server on a WS-X6748-GE-TX blade. The drops occur when the server reads from a NAS, which has a 10GB connection. Unfortunately, the application does a poor job of handling the drops, and does not support rate limiting. Also, both the server and NAS are on the same subnet, so implementing Layer 3 QoS is not an option.
Is there a good work-around for this scenario? Would flow control help? Or should I look in to increasing the buffer sizes of Queue #1?
#show int gig7/27
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 261639
# show queueing int gig7/27
Packets dropped on Transmit:
BPDU packets: 0
queue dropped [cos-map]
1 261639 [0 1 ]
2 0 [2 3 4 ]
3 0 [6 7 ]
4 0 [5 ]
There are few possible reasons for these kind of problems:
- Pure oversubscription - when several port or Higher speed port sending traffic out of single lower speed port. Line wont be able to send all and start to drop
- QoS tuning is not efficient
- Remote side sending flow control pause frames as it cant handle traffic that fast
- HW problem
I would recomend to start checking from first one. If you suspect drops - understand first what is traffic coming out of that port, where it is coming from to the switch. Check if oversubscription is happening. Keep in mind module architecture and it's internal oversubscription limits. Check output drops on the interface with "show int" command
For second point - if you suspect QoS, try disabling QoS globally first during MW and see if that improves situatuion then you can TS QoS further if Yes:
3rd - please check show int and see if Pause counter incrementing - if yes, check the problem on remote side.
4th - try moving link within ports on same LC, different ASIC on same LC, different LC and notice how the drops behave. You can make good decisions based on that.
Please don't hesistate to open TAC acse for this kind of problems to verify it in more details. Each situatuion might be very different so common approach does not work well here for all.
we solved our problem,
for us this seem like a hardware problem, we contacted TAC and they replaced it no problem,
we have not experienced the problem ever since.
Today we moved our links from 1-20 to 25-44 ports. It works!!!
We checked the ports with only one link on the same time from pc to blade. Just move from port to port and ping from PC to Cisco and vice versa. From 1 to 24 ports we saw packets loss.
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.15.150, timeout is 2 seconds:
Success rate is 60 percent (3/5), round-trip min/avg/max = 1/2/4 ms
From 25 to 48 ports works fine without loss.
I saw from tcpdump that all pings from cisco comes to PC. But cisco didn't saw the answers from PC.
13:42:09.919744 IP 192.168.15.149 > 192.168.15.150: ICMP echo request, id 90, seq 0, length 80
13:42:09.919758 IP 192.168.15.150 > 192.168.15.149: ICMP echo reply, id 90, seq 0, length 80
13:42:09.921342 IP 192.168.15.149 > 192.168.15.150: ICMP echo request, id 90, seq 1, length 80
13:42:09.921349 IP 192.168.15.150 > 192.168.15.149: ICMP echo reply, id 90, seq 1, length 80
13:42:11.920571 IP 192.168.15.149 > 192.168.15.150: ICMP echo request, id 90, seq 2, length 80
13:42:11.920582 IP 192.168.15.150 > 192.168.15.149: ICMP echo reply, id 90, seq 2, length 80
13:42:11.921051 IP 192.168.15.149 > 192.168.15.150: ICMP echo request, id 90, seq 3, length 80
13:42:11.921058 IP 192.168.15.150 > 192.168.15.149: ICMP echo reply, id 90, seq 3, length 80
13:42:11.921456 IP 192.168.15.149 > 192.168.15.150: ICMP echo request, id 90, seq 4, length 80
13:42:11.921462 IP 192.168.15.150 > 192.168.15.149: ICMP echo reply, id 90, seq 4, length 80
This is my test config
description 6748 test
ip address 192.168.15.149 255.255.255.252
switchport access vlan 7
switchport mode access
Mod Ports Card Type Model
--- ----- -------------------------------------- ------------------
1 16 SFM-capable 16 port 1000mb GBIC WS-X6516-GBIC
2 16 SFM-capable 16 port 1000mb GBIC WS-X6516-GBIC
3 16 SFM-capable 16 port 1000mb GBIC WS-X6516-GBIC
4 16 16 port 1000mb MTRJ ethernet WS-X6416-GE-MT
5 8 CEF720 8 port 10GE with DFC WS-X6708-10GE
6 2 Supervisor Engine 720 (Active) WS-SUP720-3B
7 24 24 port 100FX Multi mode WS-X6324-100FX-MM
8 48 SFM-capable 48 port 10/100/1000mb RJ45 WS-X6548-GE-TX
9 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX
Why is ports from 1 to 24 work with packets loss?
The problem might be related to load on the ASICs corresponding to those ports 1-24. Some of other links can already carry traffic on link spead. Oversubscription on this module is 1.2:1 meaning that 12 ports sharing 10G ASIC. So if all send traffic on line rate - you will have drops.
Also nothing excluding the bad port NIC - so you can see if moving the link to some other port withing first 24 also solves the problem. Then it would mean some HW problems on single port/ group of port and their ASIC rohini or or ASIC Janus for group of 24 ports.