Re: CPU pegged for 30 minutes - What could cause?

ebrandertsa · ‎11-22-2010

We had an incident where our four core 6513s pegged their CPUs at the same time causing havoc with routing. It lasted about half an hour. I was not able to get the results of a "sh proc cpu" at the time of the incident.

Supposedly nothing was changed, either to start or resolve the issue. I don't beleive that for a second but right now I have no way to confirm one way or the other. The logs were not helpful. They show nothing before the incident, and nothing useful during the incendet except repeated EIGRP neighbor changes.We suspect those were because the EIGRP messages were not flowing properly due to pegged CPU.

SNMP was hit or miss at the time but it looked like a lot of traffic was happening on all the VLAN 1 assigned ports.

The 6500's are configured in a chain, with A connected to B, connected to C, connected to D. There is no mesh or loop (different story). However we do have some access switches that are connected to both A and B. MST spanning-tree is in use.

My question to you folks is what are some of the possible causes for such a thing?

Jon Marshall · ‎11-22-2010

Eric

Were the access switches running high CPU as well ?

Obvious first though is some kind of L2 loop in your network which is overloading the switches.

Apart from that, one particular instance i have come across was when out server guys were building a VM machine and as soon as the port was brought up the switch, a 6500 literally died on it's feet. We never had the time to actually get to the bottom of it as it was in our DC but i suspect this could have been related to STP as well.

Jon

ebrandertsa · ‎11-22-2010

Thanks for the reply.

I'm afraid I don't know the answer on the access switch CPUs... I waited too long and now I don't have any CPU history from that time.

I've gone down the avenue of a layer-2 loop and just can't find how this would happen on it's own. Your VMware suspect could also be suspect here however the systems guys insist there were no changes.

I did discover that I'm having a MAC address problem, maybe related to our firewall, that's causing every packet on vlan 1 to be sent to every port on vlan 1. Though I doubt that's the cause of this problem I bet it had a part in making the problem worse (that issue disussed https://supportforums.cisco.com/thread/2053940 separately).

Jon Marshall · ‎11-22-2010

Eric

Your VMware suspect could also be suspect here however the systems guys insist there were no changes.

Your systems guys may be more honest than ours but ours swore blind they weren't making changes until the network engineer actually turned up to the DC to find them building the machine

Jon