cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1699
Views
0
Helpful
2
Replies

Unexplainable Packet Loss

Boyan Sotirov
Level 1
Level 1

So here's the problem. I was notified that there are packet drops, so I originally started investigating on the interface error counters - nothing. I also checked the switching fabric - nothing. That's why I decided to investigate deeper. 

The setup is the following (topology diagram attached):

1. The server is connected over 1Gbps port to a Cisco 4978 switch.

2. The 4948 is connected over a 10G port to a Cisco 7609 (Router2). There all of the four 10G ports are configured as layer 2 ports and a connection is made to the other 7609 (Router1).

3. Router 1 receives the traffic over a 10G port (the direct link on the topology diagram), performs NAT and routes the packets over a L2 interconnect to another location.

4. In the other location there's an intermediate router that just routes the packets over the public Internet. 

5. The packet is received at my home router where I do a capture.

I decided to put some ACLs filtering on the ICMP traffic to see what is received and what is sent. On Router 1 I put ACLs in both directions on interface Vlan 1 (all TenGig ports are configured as L2 and participate in VLAN1). Another set of inbound/outbound ACLs on the WAN 1Gbps interface on the router. And I also start in parallel a packet capture on my home router. The ICMP session is with 1000 packets, 256 bytes, 500ms frequency. The results were quite stunning. As it turns out, the bulk of the traffic being lost is at the LAN! And there all the traffic flows over 10G lines. They're loaded at 2Gbps maximum.

What I cannot understand is the following:

1. Why I cannot capture the lost packets? Like I wrote above, there are not errors on the interface counters... there are not fabric errors. All error counters are zero. Also the systems does not seem to be loaded at all.

2. My assumption is that the traffic is lost on Router 1 before it's being processed on the L3 engine. But how to check that?

3. Another option is, that the traffic does not even arrive at Router 1, but than where does it go? And how can I check it? The MAC tables are in order and they're pointing to the proper outbound ports.

So in general, what am I missing here? And where to find those lost packets...

Regards,

Boyan

2 Replies 2

Boyan Sotirov
Level 1
Level 1

I noticed these days that during continuous ping session from a server to the DG - the router which is the VRRP master and is doing NAT, there are huge deviations in the time of the ICMP responses from the router. We're talking about huge - between 0.5 ms to 200 ms. Occasionally there are drops too.

So if I suspect correct, the ICMP packets to the router are treated with very low priority and this might explain this performance.

That's why I tested with servers from an outside network, so all of the traffic is now passing through the router and being NATed. The time deviations are better, much better, but occasionally I get packet loss. Could this be caused by the NAT? How can I check on that?

Boyan Sotirov
Level 1
Level 1

So this has been tough. With the help of a more experienced friend this was finally solved.

It turned out to be a combination of:

- residual configurations

- not totally precise configurations. A wrong subnet configuration was causing routing loops in the network.

- old equipment still hanging connected and not disabled. In fact this old router was still running BGP and OSPF sessions! WOW!

Review Cisco Networking products for a $25 gift card