Re: High EIGRP Pending routes

sonic31ss · ‎01-10-2009

Hello All,

Most of the documentation I can find on "Pending routes" from show ip eigrp interfaces comes directly from Cisco and is not that helpful.

Their definition is "Number of routes in the packets sitting in the transmit queue waiting to be sent."

The reason for this question is that we have an EIGRP meltdown on our core routers (6500s - Sup720) about every three months. We created scripts to collect more data from the router on a 5 minute interval and the early indication that EIGRP was in trouble was that there were

52 out of approximately 300 EIGRP interfaces that had as many as 8000 pending routes. The remainder of the interfaces had zero pending routes. The IP routing table is approximately 4000 routes.

The CPU utilization was 34% at 5 sec with 11% consumed by EIGRP PDM.

Five minutes later the CPU was at 100% and Pending routes were as high as 100,000 on some of the EIGRP interfaces. All interfaces with EIGRP neighbors had tens of thousands of pending routes.

This core router (Core_01) has another core router attached (Core_02) and it did not record a high number of Pending routes.

There was no indication in the syslog of any significant topology change leading up to or during this event.

So my questions to the group are;

1. Does anyone have a better definition of Pending routes?

2. How often are the counters from show ip eigrp interface updated by the IOS?

3. Does 8000 Pending routes seem to high considering that there are only 4000 routes in the IP routing table?

Thanks in advance

Giuseppe Larosa · ‎01-10-2009

Hello Jeffrey,

the fact that 8000 pending routes are waiting to be sent out an interface while the total number of routes in EIGRP domain is 4000 may mean that multiple copies of the same update packets are in the queue.

Or better it can mean you have two EIGRP neighbors out that interface each waiting for 4000 routes.

During the initial phase of the problem do you see a Qcount greater then 0

in the sh ip eigrp neighbors for the neighbors out the interfaces with non zero pending routes ?

The two should be related.

DUAL works based on the fact that all updates are sent to all neighbors respecting an order and timely.

EIGRP has some features like the capability to transmit first time updates as multicast over a LAN and the retransmissions are sent as unicast so the router has to track for each valid neighbor what update has been ackwnoledged by that neighbor.

The Ack reports a sequence number and the transmit window is 1.

the time to wait before switching from multicast to unicast is given by the multicast flow timer.

The interval between subsequent unicast retransmissions is given by the RTO retransmission timeout.

So I would try to verify comparing

sh ip eigrp neighbors

sh ip eigrp interfaces

if there is correlation betweeen the counters.

If I understood correctly all this happens one time every 3 mounths.

You say you have 300 EIGRP interfaces.

How many EIGRP neighbors are pointing to the core router?

All EIGRP neighbor routers really need to receive all the 4000 routes or some of them are already configured as stub routers ?

Have you also implemente some form of route-filtering distribute-list and so on that can increase the job for the router ?

Do you see in the log any SIA issue ?

Once every 3 mounths is quite a long time for a routing protocol.

Given the complexity of the problem I would open a Service Request to TAC.

I would also consider a network design review seeing if it possible to reduce the load in terms of number of neighbors on this core routers may be introducing another level in the hierarchy.

And using where possible EIGP stub feature on remote site routers and manual summarization in order to reduce the number of neighbors that need to receive all the routes.

And may be the more important question : it is always the same core router that experiences the EIGRP crisis or they alternate in this role ?

if both routers serve the same set of EIGRP neighbors and always the same has problems I would point to this device.

Compare IOS releases, memory, I/O memory if everything match I would think to change the supervisor.

if they alternate in the problem the cause can be external.

Hope to help

Giuseppe

sonic31ss · ‎01-10-2009

Hi Giuseppe,

Thanks for the very detailed response. Lots of good points and questions.

So the interfaces that had pending routes had only 1 neighbor, most of the 300 interfaces have only one neighbor. We also collect sh ip eigrp neighbor in the same script. Yes, the Q count on the neighbors are non zero numbers, but at most they are 100+ in the queue.

There are roughly 350 neighbors on each of the core router. Most of the interfaces were configured with ip summary-address eigrp statements that significantly reduces the number of routes the EIGRP neighbors receive. I have to admit that we were not consistent in applying the summary-address statements and there were a few that were receiving the full ip routing table that did not need it. The other neighbors were receiving routes on average of a few hundred.

There are also distribute-lists applied to the interfaces. I should point out that the majority of interfaces are GRE Tunnels.

Yes, after the CPU is at 100% we see a significant number of SIAs. But EIGRP does not behave well when the CPU is that high.

No, it is not always the same core router. There are two core routers at the location this last happened and the problem is not always the same core router. There are pairs of core routers at other locations and the problem has happened on other core routers, just not as frequent as this location.

Thank you,

Jeff

Giuseppe Larosa · ‎01-11-2009

Hello Jeff,

>>

No, it is not always the same core router. There are two core routers at the location this last happened and the problem is not always the same core router. There are pairs of core routers at other locations and the problem has happened on other core routers, just not as frequent as this location.

I would open a Service Request and perform a network design review.

Hope to help

Giuseppe

mszeftawy · ‎01-10-2009

Hi Sonic

Most probably the pending routes are the cause of high CPU usage and not the opposite.

I mean due to large Queue of the eigrp updates can increase the CPU utilization

And the main cause of this issue is the congested links.

i think that you should increase the EIGRP bandwidth percent under the interfaces that has pending routes using ip bandwidth-percent eigrp AS or try to reduce the eigrp routing table as possible to reduce the bandwidth used for EIGRP updates

sonic31ss · ‎01-11-2009

Hello mszeftawy,

The interfaces in question were not under load prior to or during this event.

Best regards,

Jeff