Cisco Support Community
cancel
Showing results for 
Search instead for 
Did you mean: 

Debugging Input drops

Steps to debug Input Packet drops:

Introduction:

In this article, we will discuss the common reasons for input packet drops and how the drops could be captured in general.

This document will be an ongoing effort to understand the input drops & capture and so if after reading this article things are still not clear, please make comments necessary on the article and answers will be provided.

Core Issue:

 

Why input drops occur?

 

There are several reasons why a packet drop occurs at the ingress of an interface. There are about 500-800 drop counters depending on the line card type and NP generation. So when we type show drops verbose location <> the user will find all the relevant drops. All the NP Input drops in general can be captured by doing a “show drops verbose location <> | include <counter name>” The counter name can be retrieved by identifying the type of drop at the time it occurred.

Some of the common drop reasons at ingress are:

1) Out of buffer size:  There are several buffers within the NP itself that can have overflow issues or underflow issues. If we take the case of a Traffic Manager Ingress and a Egress Queue buffer and if either of those are out of buffers then the TM complains, which might lead to NP issues and result in packet drops.

E.g. show drops verbose location 0/1/CPU0 | include BUF_EXCD
Mon Feb 10 18:32:50.249 EDT
PUNT_DIAGS_RX_BUFF_EXCD 0
PUNT_DIAGS_RX_BUFF_EXCD 0

2) Link issue between PHY & NP: This could be due to some SERDES errors. Could lead to either a hardware or software issue.

e.g. NP-MAC--------------PHY----------------Optics

NP-MAC is the first entry point for data. The following CLI provides MAC level Ingress and egress statistics. 

show controller tengig <> stat

Further ingress drops may happen in NP Ucode. 

The following CLI may provide more details. Since many ports are mapped to one NP, it may not give per-port ucode drops statistics.

show controller np counters all location <>

3) NP Back pressure:  This is typically an over subscription case.  E.g. if the ingress LC sending traffic of 30gige to an egress of 10 gig, the fia might backpressure to NP of the ingress and hence we see the ingress drops.

e.g.  show drops verbose location <> | include <counter_name>

4) NP Lockup: NP Lockups can occur in all sorts of situations. NP lockup happens when microcode has a bug or a low level resource specific to the NP such as the TM, ICFD, and NP Engine gets blocked. If the HW detects that the NP has stopped forwarding, this scenario can be termed as NP lockup.

In general when the NP lockup happens, it raises a syslog with NP-DIAG in the logs. Soon after this the local ping will fail if the NP is locked up and the LC will automatically reload. Soon after this the local ping will fail if the NP is locked up and the LC will automatically reload that subsequently results in heavy traffic loss.

    • LC/0/3/CPU0: Mar 14: 09:07:22:122 : prm_server_ty: NP-DIAG health monitoring failure on NP3

There is no CLI as such to determine the NP lock up situation except the user the monitor & observe the above described situation.

 

5) Unrecognized upper-level protocol: This could also be termed as “Input drops other”. For e.g. when a router receives a new LSP, it floods this LSP to its neighbors, except the neighbor that sent the new LSP. On point-to-point links, the neighbors acknowledge the new LSP with a PSNP, which holds the LSP ID, sequence number, checksum, and remaining lifetime. When the acknowledgment PSNP is received from a neighbor, the originating router stops sending the new LSP to that particular neighbor although it may continue to send the new LSP to other neighbors that have not yet acknowledged it.

Symptoms: Layer 3, for e.g. ISIS 64 bytes PSNP packets are counted as "Input drop other", though they are not really dropped.

Conditions: ISIS router with links configured as point-to-point.

Work around: This problem is merely cosmetic, the PSNP's are still processed just reported incorrect due to an accumulation discrepancy in the sw.

6) RESOLVE_VPLS_REFLECTION_FILTER_DROP_CNT should not be in total drops: The counter RESOLVE_VPLS_REFLECTION_FILTER_DROP_CNT should not be included in the calculation for total drops seen under an interface.

 

Ex)

RP/0/RSP0/CPU0:BEL1MN#sh int te 0/6/0/0

Tue Jan 31 10:39:39.047 ARG

TenGigE0/6/0/0 is up, line protocol is up

...

  Last clearing of "show interface" counters never

  30 second input rate 962000 bits/sec, 702 packets/sec

  30 second output rate 10000 bits/sec, 19 packets/sec

     519481709 packets input, 587232457343 bytes, 19412 total input drops

 

 

RP/0/RSP0/CPU0:BEL1MN#sh controller np counters np4 location 0/6/cpu0

Tue Jan 31 10:39:52.210 ARG

 

                Node: 0/6/CPU0:

----------------------------------------------------------------

 

Show global stats counters for NP4, revision v3

 

Read 44 non-zero NP counters:

Offset  Counter                                                                             FrameValue   Rate

(pps)

----------------------------------------------------------------------------------------------------------------

...

 447    RESOLVE_VPLS_REFLECTION_FILTER_DROP_CNT          19411        0

 

There are 19411 drops for RESOLVE_VPLS_REFLECTION_FILTER_DROP_CNT and 19412

total drops for the interface.

 

This was duplicated in the lab and there is a one to one increase in the NP counter and the total drops counter.

 

This provides confusion to the customer who thinks they have a traffic loss issue and impedes troubleshooting when trying to identify the source of the drops. The steps to identify the issue are as stated below -

0)      clear both interface counters and np counters.

1)      Show interface xxx,   if there is non-zero generic input drop counter then

2)      Show controller np ports all location yyy, this will display which NP the interface in question is on

3)      Show controller np counters npz location yyy,  where z is the NP # shown in step 2

In the output, find RESOLVE_VPLS_REFLECTION_FILTER_DROP_CNT, RESOLVE_INGRESS_DROP_CNT, RESOLVE_EGRESS_DROP_CNT

 

                If the generic input drop matches (RESOLVE_VPLS_REFLECTION_FILTER_DROP_CNT - RESOLVE_EGRESS_DROP_CNT)

That means there is ingress drop due to loop, please figure out where the cause of loop.

If the input drop is more than the delta, that means other input drops also exists, try find other DROP counter under the np.

If the delta is 0, that means the input drop is NOT due to the loop condition, try find other DROP counter under the np.

 

7) l2-tcam Invalid DA Drops

8)Controller drops: runts, FCS, aborts, FIFO overflows, giants

9) Unknown DMAC or dot1q vlan

 

CLI to capture packet loss.

Before beginning to debug traffic issues, please clear all counters and start afresh.

Clear Interface counters

RP/0/RSP0/CPU0:ROSH06_jetfire#clear counters all

Clear "show interface" counters on all interfaces [confirm]

RP/0/RSP0/CPU0:ROSH06_jetfire#

Clear NP counters

RP/0/RSP0/CPU0:ROSH06_jetfire#clear controller np counters all

Clear Fabric counters

To clear FIA counters on LC and RSP:

RP/0/RSP0/CPU0:ROSH06_jetfire#clear controller fabric fia location 

To clear all fabric crossbar counters:

RP/0/RSP0/CPU0:ROSH06_jetfire#clear controller fabric crossbar-counters location 

To clear bridge counters on LC

Check all the relevant traffic counters

After clearing counters, start traffic pattern that caused the drop.

Check the counters at input interface: This is first place to check in. Here the user can observe the packet count increment each time the user execute’s this command and identify if a drop is occurring by comparing the input and output packet rate.

RP/0/RSP0/CPU0:ROSH06#show interfaces tenGigE 0/1/0/0

Thu Jan  1 01:10:01.908 UTC

TenGigE0/1/0/0 is up, line protocol is up

  Interface state transitions: 1

  Hardware is TenGigE, address is 001e.bdfd.1736 (bia 001e.bdfd.1736)

  Layer 2 Transport Mode

  MTU 1514 bytes, BW 10000000 Kbit

     reliability 255/255, txload 0/255, rxload 0/255

  Encapsulation ARPA,

  Full-duplex, 10000Mb/s, LR, link type is force-up

  output flow control is off, input flow control is off

  loopback not set,

  Maintenance is enabled,

  ARP type ARPA, ARP timeout 04:00:00

  Last clearing of "show interface" counters never

  5 minute input rate 0 bits/sec, 0 packets/sec

  5 minute output rate 0 bits/sec, 0 packets/sec

     0 packets input, 0 bytes, 0 total input drops

     0 drops for unrecognized upper-level protocol

     Received 0 broadcast packets, 0 multicast packets

              0 runts, 0 giants, 0 throttles, 0 parity

     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort

     0 packets output, 0 bytes, 0 total output drops

     Output 0 broadcast packets, 0 multicast packets

     0 output errors, 0 underruns, 0 applique, 0 resets

     0 output buffer failures, 0 output buffers swapped out.

Check NPU counters

Show controllers NP counters all location 

Fields of interest in NPU counters from data path standpoint:

800 PARSE_ENET_RECEIVE_CNT      -- Num of packets received from external interface

970 MODIFY_FABRIC_TRANSMIT_CNT  -- Num of packets sent to fabric

801 PARSE_FABRIC_RECEIVE_CNT    -- Num of packets received from fabric

971 MODIFY_ENET_TRANSMIT_CNT    -- Num of packets sent to external interface

The following CLI with grepping ingress display the drop counters of any particular counters the users are interested in i.e. GRE, l2vpn, mpls etc.

The first form as indicated only shows non-zero (counters that have a non-zero value) counters:         

show drops np all loc <0/5/CPU0>| inc Ingress

e.g.

Sat Feb 1 14:22:05.158 UTC

Node: 0/5/CPU0:

----------------------------------------------------------------

NP 0 Drops:

----------------------------------------------------------------

MODIFY_PUNT_REASON_MISS_DROP 1

----------------------------------------------------------------

NP 1 Drops:

----------------------------------------------------------------

MODIFY_PUNT_REASON_MISS_DROP 1   

show drops np all verbose loc <> | inc INGRESS

show drops verbose | include INGRESS

*******************************************************************

GRE_ING_DECAP_P2_GRE_KEY_PRESENT_DROP                        0              

GRE_ING_DECAP_P2_GRE_SEQ_PRESENT_DROP                        0              

GRE_ING_DECAP_P2_GRE_NONZERO_VER_DROP                        0              

GRE_ING_DECAP_P2_GRE_NONZERO_RSVD0_DROP                      0              

GRE_ING_DECAP_P2_PROT_UNSUPPORTED                            0              

GRE_ING_DECAP_P2_NESTED_GRE_DROP                             0              

GRE_ING_DECAP_P2_CLNS_NO_ISIS_DROP                           0              

GRE_ING_DECAP_P2_NO_UIDB_DROP                                0         

**************************************

Commands to monitor packet counts at various places:

Ingress interface

Ex: show interfaces gigabitEthernet 0/6/0/25

show controller gigabitEthernet 0/6/0/25 internal 

Ingress NP

Ex: show controllers np counters np1 location 0/6/CPU0 

Ingress NP fabric counters

how controllers np fabric-counters tx np1 location 0/6/CPU0 

Ingress bridge

Ex: show controllers fabric fia bridge stats location 0/6/CPU0 

Ingress FIA

Ex:

show controllers fabric fia stats location 0/6/CPU0

show controllers fabric fia drops ingress location 0/6/CPU0

show controllers fabric fia q-depth location 0/6/CPU0 

Ingress crossbar

Ex:

show controllers fabric crossbar statistics instance 0 location 0/6/CPU0

show controllers fabric crossbar statistics instance 1 location 0/6/CPU0

Version history
Revision #:
1 of 1
Last update:
‎02-18-2014 11:56 AM
Updated by:
 
Labels (1)