Strange CPU spikes

Unanswered Question
Dec 5th, 2008

We run cat4500-ENTSERVICESK9-M), Version 12.2(40)SG on Cisco 4510Rs.

And ccasionally i see CPU spikes of up to 80% for a few minutes and sometimes loss of connectivity to layer 3 addresses on the switches.

It has in the past caused brief losses in layer 3 connectivity but usually its just a monitoring abnormaility and doesn't cause any noticable problems.

Does anyone have any ideas what this could be? Traffic seems normal at this time.

Are there any increased logging or debugging I can do to try and isolate the problem?

Or perhaps any obvious known bugs with the cat4500-ENTSERVICESK9-M), Version 12.2(40)SG IOS?

Help would be much appreciated.

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
sachinraja Fri, 12/05/2008 - 09:01

Hello Mike

Most of these troubleshooting have to be done online.. In any case, you can have a look at the logs in the switch (show log), and any kind of traffic bursts on the network.. could be a problem with STP, broadcasts etc etc.. Put the switch in monitoring mode, and enable logs (informational) if you havent done already .. use a external logging (syslog) server, since your internal memory will not hold more info ! use the "logging" commands to achieve this...

Best way is to go around the network (logically) and find if there are any open loops on the network.. try to harden your devices ! make sure you enable simple things like bpduguard, loopguard etc ! You are the best judge here.

For bugs on 12.2(40) refer to the release notes in CCO..

All the best..

Raj

glen.grant Fri, 12/05/2008 - 18:11

If you can clear the counters while it is happening you might get a better idea of who is doing what and why . Could be almost anything , I have seen things like people doing unauthorized multicasts basically bury a Sup 720 before.

scottmac Sat, 12/06/2008 - 06:23

You don't really specify any typical period for the occurrences, but I believe it's probably "garbage collection" ...

That's the process of evaluating the various processes' memory usage, stack usage, (non-interface) buffer usage & such, then "collecting" memory that is allocated but not used (from the "malloc" - Memory Allocation process in the language / compiler).

It's very CPU intensive for short times and is usually done as a process triggered by inactivity, or polled based on time (then checked for utilization to make sure it doesn't drive the system off a cliff).

Failure to garbage collect properly is one of the causes of a "memory leak" where memory is allocated, but never recovered at the termination of the process, so next time there's less, then it happens again (and again, and again) until there's no more usable memory left.

That's my guess, and I'm sticking to it ...

Good Luck

Scott

mikedelafield Wed, 12/10/2008 - 02:53

hi

thanks for your replies though i'm now convinced its some rogue multicast device

i've started monitoring all ports and monitoring vlan traffic and at the exact time of the spikes i'm seeing massive multicast traffic within one of the VLANs.

the source address was 169.254.6.5 which is strange as its a failed DHCP address. also present were 168.254.255.255 and the multicast address of 239.255.255.250.

the total traffic output was 1.2gb to various ports across the space of 10seconds or so.

can anyone shed any light on this?

thanks.

Ramprasad Pr Wed, 12/10/2008 - 04:36

DHCP server does not respond properly, the DHCP client picks a random ip in

the 169.254.x.x mask 255.255.0.0

239.255.255.250 SSDP ( Simple Service Discovery Protocol).

Check the dhcp configuration .

HTH

Ram

mikedelafield Wed, 12/10/2008 - 05:20

Thanks. I know what the 169.254 range represents; I am more concerned what or how an address in this range could be generating 1.2Gb of data in a very short interval?

Actions

This Discussion