50% packetloss to connected routes

kfulton · ‎01-28-2010

Greetings,

I'm running a pair of 6509s with a single SUP2/PFC2/MSFC2 each, IOS (tm) c6sup2_rp Software (c6sup2_rp-JK2SV-M), Version 12.1(27b)E4, RELEASE SOFTWARE (fc3).

I'm running BGP to two external ISPs and to each other, and OSPF for the loopback addresses between them. I'm encountering a sporadic problem where sometimes I see exactly 50% packetloss (every other packet) to random connected routes. There seems to be no rhyme or reason to why this starts to happen, and I'm not finding anything in the logs either. I've checked the ARP table, CEF, routes, etc, with no success.

When this problem is demonstrating itself, I can reach all addresses on that interface from the router itself, but if I try to reach them from another interface on the router, I get exactly 50% loss to ALL addresses on that particular interface, including the router's IP address on that interface, but see no problems at all on similarly configured ports on the same module. (The 50% loss occurs with all TCP, UDP and ICMP traffic. I've also removed the ip icmp rate-limit to rule that out as well.)

The most odd thing is that if I add a default route, the problem goes away, even if I remove the default route immediately afterward! Have I encountered an IOS bug? Any suggestions of things to look at would be greatly appreciated.

Thanks

sachinraja · ‎01-28-2010

Hi Kevin

Can you please let us know more on where you are running your OSPF/BGP etc ? where are you trying to ping , when you get 50 % drops ? Can you give us a schematic, or show ip route outputs where the drops occur ? it could be a routing issue or someother thing which is not allowing packets one side...

Raj

kfulton · ‎01-28-2010

Hi Raj,

Thanks for the reply.

The two 6509s are connected via etherchannel trunk. One of the VLANs is configured for routing between them. OSPF is configured on this VLAN for the BGP loopback addresses only. They are iBGP peers (using the loopback addresses) for both internal and external routes.

While the problem is occuring, the 50% packetloss happens even from other directly connected interfaces. For example, I can ping from x.x.61.4 to any other connected or external route without any problem, but get 50% loss to x.x.60.73 (the interface address), x.x.60.74, etc.

-----

Config snippets:

interface FastEthernet4/1
 description trunk to firewall
 speed 100
 duplex full
 switchport
 switchport trunk encapsulation dot1q
 switchport trunk native vlan 666
 switchport trunk allowed vlan 400-499
 switchport mode trunk
 no cdp enable

interface FastEthernet4/8
 ip address .60.73 255.255.255.248
 ip access-group fa-4-8-in in
 no ip unreachables
 speed 100
 duplex full
 no cdp enable

ip access-list extended fa-4-8-in
 permit ip .60.72 0.0.0.7 any

interface Vlan402
 ip address .61.1 255.255.255.0

-----

When the problem is occuring, the route looks the same as it usually does:

#sho ip route .60.73
Routing entry for .60.72/29
Known via "connected", distance 0, metric 0 (connected, via interface)
Redistributing via bgp
Advertised by bgp
Routing Descriptor Blocks:
* directly connected, via FastEthernet4/8
Route metric is 0, traffic share count is 1

-----

Thanks

sachinraja · ‎01-28-2010

Hi Kevin

WHat is connected on Fas 4/8 ... do you say you get packet drops from 6509-1 todirectly connected interface to x.x.60.74 ? I see the local configuration for F4/8 as:

interface FastEthernet4/8
ip address .60.73 255.255.255.248
ip access-group fa-4-8-in in

ip access-list extended fa-4-8-in
permit ip .60.72 0.0.0.7 any

why is the acl configured ?

can you explain more on the point that you didnt get 50 % drops when you added default route ? you added it on 6500 pointing to the firewall ? are you getting drops to directly connected interfaces ? do you have any diagrams ?

Raj

kfulton · ‎01-28-2010

Hi Raj,

Fa4/8 is connected to a customer's router device. I used 4/8 as an example, but this problem has happened to different interfaces, randomly. The ACL is there as a sanity check to make sure the customer is not sending traffic from a phony source address.

The static default route that I add and remove to "fix" the problem is pointing to my upstream ISP's router. This was discovered completely by accident. When we're experiencing this issue, if I enter these commands the problem disappears:

-----

conf t

ip route 0.0.0.0 0.0.0.0

no ip route 0.0.0.0 0.0.0.0

exit

-----

Yes, these drops occur even from one directly connected interface to another directly connected interface.

I don't have a diagram handy but can create a crude one if it would help.

Thanks

sachinraja · ‎01-28-2010

Thanks Kevin

I really dont see any correlation between a default route , out of your network, to packet drops inside your LAN, on directly connectedr segment.. if you can give us a very rough diagram, it would help us looking at ur issue..

if it is directly connected segment, do you have dual connectivities to the end point ? even if cpu, memory, errors occur, it would be random drops and not exactly 50 % drops with every alternate packets dropping.. we need to look at load balancing segments (ethernchannel, dual nics etc) and see if we can correlate this problem with any of these...

Raj

kfulton · ‎01-28-2010

Hi Raj,

I whipped up a quick and dirty diagram.

I have no explanation for why adding and removing the default route works. IP route table corruption perhaps?

Anyway, this problem will randomly appear on any of our directly connected interfaces on either one of our routers. In the example I gave you, if fa4/8 is exhibiting the behavior, then we see exactly 50% loss from ALL destinations, and for the sake of ruling out BGP, we test from our directly connected firewall. If we ping (or send any other kind of traffic) from x.x.61.4 to x.x.60.74 (the customer's device) or even x.x.60.73 (the router's interface address), we see every other packet being lost. However, if we ping directly from router 1, there is no packetloss at all to either x.x.61.4 or x.x.60.74.

So far, every one of the affected destinations have been directly connected interfaces with no etherchannel or load balancing involved at all.

Thanks

sachinraja · ‎01-28-2010

Hi Kevin

Can you give us the configs of the port connecting to the firewall and the customer router ? Is the core switch running hsrp betweenp themselves ? see the arp for x.x.60.74 and x.x.61.4 and see if the mac-address table is forwarding it onto the right port.. see if there are no duplicate ARPs for these IP addresses.. if so, you can probably clear the ARP table entries and then try pinging each other.. do you also see drops if you ping the firewall ip x.x.61.4 or 3 from the customer router ?

also are there any qos configurations on the ports ? Sometimes due to QOS, your ICMP packets can get lowest priority and you can see such drops.. do let us know..

Raj