After reading the packet troubleshooting guide for the ASR9000 there still may be some questions left open as to what the precise packet is that was subject to that drop counter.
Until today we didn't ahve a good capability to capture packets outside the PARSE stage of the NPU (and then via heavy engineering commands).
With the Packet Capture capability that will come in XR 4.3.1 (spring of 2013) you will be able to capture packets for a variety of counters.
At a glance
What is this all about, in a quick few words, the key things you need to know to get started using the packet capture capability.
What is it
It allows you to capture packets that are subject to a particular counter (not necessarily a drop counter) when the packet traverses the various stages of processing inside the NPU.
How does it work
How to start
Once the packet is determined to be matching a counter set as a trap, the packet is sent to the CPU for formatting and displaying.
The output will be in a hexadecimal format, but this is easy to convert into a wireshark format for evaluation and printing (to be discussed later)
Note that a captured packet will be DROPPED!
That means that if you would trigger on a counter such as PARSE_FABRIC_RECEIVE, which means packets received from the fabric for processing in the egress direction, these packets potentially could have been forwarded. Because of the capture, we are diverting the packet from the normal forwarding path for display and we are not reinjecting the packet back.
So be careful when selecting a particular counter to trigger on.
When triggering on a true drop counter, this is obviously not an issue.
How to use
With one simple command you can enable the packet capture:
We'll discuss the precise COUNTER_NAME and NPU values separately.
Note: In some XR releases the NP reset after the execution of "monitor np counter" is optional. We strongly recommend to always select the reset option after running the monitoring.
While this packet capture is GREAT and something we all have been waiting for for a long time, you need to be aware of the following limitations:
1) After the captures have been made upto the number of packets specified (N), or when you exit the capture mode, the NPU needs to be reset. This is a simple internal reset operation of the NPU to free used resources for the capture but during this fast reset operation the NPU will not be forwarding. You should expect a forwarding loss of about 50 msec.
This is regardless of whether packets have been captured or not, everytime you quit or exit the capture mode, this reset is IMMINENT and you'll be warned before starting the capture that this is going to happen.
2) Nothing in life is for free and neither is this. When you capture a large number of packets on a counter that is very active, the CPU will be more busy then normal. Recommend NOT using this via console, but only via TELNET or sync connections.
3) When using this facility, make sure you exit the capture facility properly before closing your telnet connection. If you have an exec timeout configured it is recommended to disable that while the capture facility is running as a good practice. If your exec dies while a capture is enabled, it will not drop out of the capture mode!
To avoid LC reload)
–Step 1: Issue another “monitor np counter” command then press Ctrl-C quickly to send a kill signal to cause monitor to detach from NP.
–Step 2: Issue a third “monitor np counter” command then press Ctrl-C right away to cause a Fast Reset to clean up.
4) This feature is for Typhoon based linecards ONLY. There is no plan to support this on Trident linecards.
5) Not all NP counters are supported, for instance PUNT counters can't be enabled for capture (but we have SPP/NETIO debugs for those anyway), if a counter is not supported, the CLI will return you a message about it.
6) The cli option "noreset" should not be used as this is for internal development use only. Using this option will leave the system in an undefined state with potential leaked buffers and you may have suboptimum performance.
7) The maximum number of captured packets is 100. But for slow speed interfaces (eg 1G) you should not have more then 20. (This because the capture buffers are shared packet buffers which are in turn shared by multiple interfaces).
Detailed how to step by step guide
Step 1 : Complaint of data loss
Determine the interface that is currently experiencing this loss
Step 2 : Correlate interface to NPU
From the packet troubleshooting guides you may remember that first you need to link the interface to the NPU via the command:
show controller np ports all location 0/X/cpu0
whereby X is the slot that holds the interface in question.
Step 3 : Attempt to identify the counter that is associated with the traffic loss
Knowing the NP that is used for forwarding traffic on this interface, we can view the NP counters with the following command:
show controllers np counters npY location 0/X/CPU0
Where X is the same slot ID as before and Y is the NP number that we found via the command in step 2.
Step 4 : Capturing the packets associated with a counter
Now that we know the counter that we're interested in found in step 3, we can enable the packet capture facility and capture some of those
packets. For example lets assume that the drop count was DROP_IN_UIDB_DOWN and the associated NPU is np2 on a Linecard located in slot 1.
Command to use will be:
monitor np counter DROP_IN_UIDB_DOWN np2 loc 0/1/CPU0
Step 5 : Confirm the warning
Warning: A manditory NP reset will be done after monitor to clean up.
This will cause ~50ms traffic outage. Links will stay Up.
Proceed y/n [y] > y
the capture will proceed and the system will respond with a line similar to this below:
Monitor DROP_IN_UIDB_DOWN on NP2 ... (Ctrl-C to quit)