SNMP Engine - High CPU Process

b.petronio · ‎02-27-2007

Hi,

I have a WS-C4506, with WS-X4013+ , and "cat4000-i9k91s-mz.122-25.EWA5.bin".

Since some time ago, i noticed that the processor was taking periodically reaching the 90-100% of its capacity.

See image attached.

I make a script for take several commands along time and find out that SNMP Engine is the process that increases in these peaks.

I had implemented an snmp commands described in a cisco doc, excluding some ARP and routing informations by SNMP, without positive results.

Can anyone helpme understand the reason of that?

Later i will put more information.

Tks,

Petr?nio

Joe Clarke · ‎02-27-2007

What is beign polled on the switch during these spikes? Adding views arbitrarily will not help. If the problem is with certain objects, you will need to figure out what they are, then decide if you can restrict them.

To find out which objects are being polled, you can place a sniffer or sniffers on in bound ports, or use "debug snmp packet" (which will add additional load to the switch).

b.petronio · ‎02-27-2007

Tks for your reply,

I've just turned off the SNMP, for debugging.

I it stays without peaks, then i will entering the communities, and progressivly entering the rest to see if any trap is causing this.

I'll update thie thread as soon as i get any results.

Please fill free to give other ideas.

Tks

Petr?nio

Joe Clarke · ‎02-27-2007

As I said, the problem most like is caused by some manager polling certain SNMP objects. Once you re-enable the community strings, the spikes are likely to come back. You will need to figure out which objects are being polled.

b.petronio · ‎03-01-2007

I place the etherreal monitoring all ports in the catalyst..

1st the cant filter the capture for only snmp packets, then, the aplication blows up... To much information...lol 25 MB in almost 10 sec...

At last, after some hours learning how to work with that, could see that CiscoWorks is sending some get-next objects.

Here are some, that i captured when spike was happening.

GET-NEXT...

iso.3.6.1.4.1.9.9.43.1.1.6.1.8 ccmHistoryEventTerminalUser

iso.3.6.1.4.1.9.9.43.1.1.6.1.10 ccmHistoryEventCommandSourceAddress

iso.3.6.1.4.1.9.9.43.1.1.6.1.3 ccmHistoryEventCommandSource

iso.3.6.1.4.1.9.9.43.1.1.6.1.6 ccmHistoryEventTerminalType

iso.3.6.1.4.1.9.9.43.1.1.6.1.5 ccmHistoryEventConfigDestination

iso.3.6.1.2.1.1.3 sysUpTime

And repeats the GET-NEXT but with an suffix .333 and so on till .342

iso.3.6.1.4.1.9.9.43.1.1.6.1.8.333

iso.3.6.1.4.1.9.9.43.1.1.6.1.10.333

iso.3.6.1.4.1.9.9.43.1.1.6.1.3.333

iso.3.6.1.4.1.9.9.43.1.1.6.1.6.333

iso.3.6.1.4.1.9.9.43.1.1.6.1.5.333

iso.3.6.1.2.1.1.3 sysUpTime

.333 + .334 + .335 + .336 + .337 + .338 + .339 + .340 + .341 + .342

As i could understand from costumer, CW has a time date defined to capture information from Network devices and not from 4 to 4 hours.

I put the following commands "excluded" in the snmp, and i'll wait for a good surprise, i hope.

snmp-server view cutdown sysUpTime excluded

snmp-server view cutdown ccmHistoryEventEntry excluded

Petr?nio

Joe Clarke · ‎03-01-2007

I doubt that will help. We typically don't see CPU hogs caused by the CISCO-CONFIG-MAN-MIB. How quickly are you seeing these polls? This may be indicative of a loop problem trying to get the config from this device.

b.petronio · ‎03-01-2007

Hi clarke,

I attached in the 1st message a screen capture of the show proc cpu history, and we can see that is about periods of 4 hours.

Unless i reconfigure the comunities, it mantain this pattern.

If i having re-configure the communities it takes a short period (10-20 mins) to appear the 1st spike, and then mantain the 4 hours period.

It happened again.

It isnt that excluded ones. :(

Tks

Petr?nio

Joe Clarke · ‎03-01-2007

What if you apply an ACL to your community string that only permits the CiscoWorks server to poll the switch? Do the spikes continue to occur? I can't imagine that CiscoWorks alone is sending 12 MB of SNMP data per second.

b.petronio · ‎03-01-2007

Why 12 MB / sec ?

access-list 11 remark SNMP ACL

access-list 11 permit 172.20.1.112

access-list 11 deny any log

snmp-server community RO 11

I was thinking in aplying this.

Agree?

Petr?nio

Joe Clarke · ‎03-01-2007

I meant 12 MB in 10 seconds.

This ACL looks okay.

b.petronio · ‎03-02-2007

Actually, it was 25 MB, cause i was monitoring 2 ports, as i have 2 snmp machines.

CiscoWorks in one interface, directly connected.

SolarWinds, in other equipment, so a trunk port connecting with an WS-C6006.

It's not all snmp information in the captured trace.

1st - Applied the filtered to ip of SolarWinds in the ACL. Will wait for a result.

2nd - If it is not the SolarWinds, will apply for the CiscoWorks.

Tks for the idea.

b.petronio · ‎03-05-2007

Well,

The ideas were implemented, and got the following results:

ACL filtering SolarWinds, results on a continuinng behaviour.

ACL filtering CiscoWorks, results on a flat cpu processing.

As i cant just turn off the CiscoWorks, now i have to understand what type of information is getting from CiscoWorks, since the taken "show logging" command give me the ideia that CW send's some request not only 4 in 4 hours, but 5 in 5 mins.

See the attached images.

ARF, this seems a never ending story !!!