High CPU Utilization on EIGRP process on a 7206VXR

gsanin · ‎01-06-2006

Hi,

I have a 7206VXR (NPE-G1) processor with 1GB of memory running the 12.3(15a) IOS code. I have about 215 frame relay subinterface with EIGRP enabled. I have noticed that when a single PVC goes down and then it comes up the IP-EIGRP: PDM process id goes to about 50%. We had a problem when 20 PVCs went down and when they came up the CPU went up to 100% making the router virtually unusable. I've searched for bugs, but couldn't find anything.

CPU utilization for five seconds: 52%/2%; one minute: 14%; five minutes: 6%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

165 15661024 372055930 42 49.87% 10.77% 3.03% 0 IP-EIGRP: PDM

Any help would be greatly appreciated.

Thanks,

Gus.

m.mcconnell · ‎01-06-2006

The problem is probably related to the sheer number of adjacencies that EIGRP needs to maintain. When a PVC goes down EIGRP needs to drop the adjacency and then when the PVC comes back up it needs to rebuild the adjacency - while maintaining 214 other adjacencies plus all of the regular stuff EIGRP needs to do. This is known to cost a lot of CPU cycles. If you get a flapping PVC or two the router could end up being dead until the flapping stops.

You might want to call TAC to see if there are any bugs related to the problem you are experiencing as not all bugs are published in the public BugToolkit.

If there are no known bugs you may want to consider converting the WAN to RIP v2 and then redistribute that into EIGRP.

I know I know you're probably like What? RIP??? I have down a couple of large scale WAN designs and when there are a large number of WAN circuits RIP v2 becomes the answer. EIGRP, OSPF and BGP cannot mantain the large number of adjacencies and RIP v2 does not form adjacencies - the routing updates are sent via multicast and this results in much lower CPU and memory utlization on the routers at the central site. Also, RIP v2 is classless.

-Mark

ruwhite · ‎01-07-2006

Are these remotes stub or non-stub? How many routes are you sending to each one?

Most likely, if they aren't stubs, and you're sending a number of routes to each one, this is normal behaviour, based on my experience with various networks, and what we've seen in lab testing. You could get some idea of what's going on by looking through the EIGRP event log on the hub router, to see how much EIGRP processing it's doing when you lose a single or a lot of PVCs out of this interface.

:-)

Russ.W

gsanin · ‎01-10-2006

Thank you all so much for your input.

There are a few things that I wanted to add to this posting that I hope can shed some light to this problem:

1. We suspect it is an IOS bug because we've had the same type of design for over 2 years now, but the IOS was upgraded about 3 months ago. We opened a ticket with Cisco, but they believe it is the design.

2. Currently, the hub routers only send just 5 routes to the remote sites and the remotes site only send 2 routes to the hubs. So, I don't believe this should cause any problems. There isn't alot of routing exchange going on.

3. We did some testing by making all the interfaces at the hub EIGRP passive, and then we made it non-passive. All of the EIGRP adjacencies (215) came right up within 3 seconds. I checked the CPU utilization and it was at about 15%.

4. We did notice on a different test that when a single PVC went up and down and the EIGRP process spiked up to about 53%. This is what led us to believe it was an IOS bug. It seems that the PVC going up and down and EIGRP trying to establish the adjacency triggers the bug.

5. We are planning to go back to the old IOS and do some more testing to make sure it is in fact a IOS bug issue.

I will keep you posted.

Thanks again.

Gus.

lrian · ‎01-11-2006

It's been a long time since I've used frame relay, so things might have changed... but we had a similar problem last century

The problem was that there is [was??] only one queue for broadcasts that's shared by all subinterfaces. Add in the fact that we hadn't configured the bandwidth on subinterfaces so they all defaulted to 1.5Mb - which meant that eigrp defaulted to using up to 768Kb for routing updates. We had some PVCs flap & eigrp never recovered.

The fix was adding the

frame-relay broadcast-queue

command on the main interface & configuring the bandwidth on the subinterfaces to the CIR.

Regards,

Lee