High cpu utilisation on switch when user tracking acquisition is run

Unanswered Question
Jun 30th, 2010

We are running LMS 3.1 with campus manger 5.1.3 on a solaris platform.

When a user tracking minor or major acquisition is run on our catalyst 6509e switches with sup720 supervisor cards the cpu % utilisation spikes up to the high 90's.

Is this normal or should remedial action be taken.

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Sven Hruza Thu, 07/01/2010 - 02:17

Hi,

I determined the same with a full snmpwalk on a 6509 switch on the part where the arp and mac tables where requested.

But I don't know if there is a solution to limit the cpu for SNMP requests.

The only way some customers use is to exclude the OIDs for those tables so the management station could not read them.

But for user tracking this is not a solution...I know....

Sven

clark.ford Thu, 07/01/2010 - 03:33

Hi Sven

Thanks for your reply.

We have ten 6509e's running different flavours of 12.2 (18) and 12.2 (33)

All of them to a greater or lesser extent report spikes during the user tracking acquisition, but those switches running 12.2 (18) SXF11 seem to be the ones worst affected and report cpu utilisation of 96 - 97%.

What was the peak utilisation you achieved?

It would be handy if someone from Cisco could reassure me that this is a normal reaction to this process running and that no remedial work is required on the switches or ciscoworks platform.

Thanks again

Sven Hruza Sun, 07/04/2010 - 23:50

Same to my devices....we run 12.2 (18) SXF11, too.

The utilization we figured out was something about >95%.

Some informations from cisco would be great, yes :-)

Sven

Joe Clarke Sun, 07/04/2010 - 23:53

We were seeing some similar behavior at CiscoLive! this year in our network.  However, I was unable to capture the "show stack" output of the SNMP ENGINE process when the spike was occurring.  If you can, get the output of "show stack PID" (where PID is the process ID of SNMP ENGINE) when this problem is occurring.  Then post that output as well as the output of "show ver".

clark.ford Mon, 07/12/2010 - 02:54

The cpu spike lasts only a few seconds so it will be difficult to run the command at that precise time, but I will give it a try.

The switches affected are all vital production devices, so could the switch be adversely affected by the action of running this command during the spike, or is it quite harmless?

Please find attached the show ver requested.

Cheers

Attachment: 
clark.ford Mon, 07/12/2010 - 04:28

Joe, here is the show stack output as requested. I took a snapshot of the cpu at the same time.

I ran the command before the spike, during the spike and after the spike and received the same output each time.

Was that to be expected?

Cheers

Joe Clarke Mon, 07/12/2010 - 13:34

This is an idle stack trace.  The process isn't doing anything.  You may need to run show stack multiple times during the spike.  One of the stack outputs should have substantially more frames.

Joe Clarke Tue, 07/13/2010 - 20:35

Well, someone is definitely polling the BRIDGE-MIB.  That's the main thing UT does, so that explains the stack.  How many CAM entries does this switch have?  What other problems are you seeing relating to this?  The reason I ask is that SNMP is a low-priority process, and will run until completion unless a higher-priority process needs the CPU.  If you have a large CAM table, you can see spikes when the BRIDGE-MIB is queried, but that may not be problematic.

clark.ford Wed, 07/14/2010 - 01:02

Joe, here is the output of the show mac-address count.

The reason that this investigation began was because another 6509e running the same s/w version maxed out at 100% cpu utilisation, this as you can imagine caused a major loss of service.

When we investigated what had caused this we noticed a number of spikes that we have identified as the UT acquisition process running on the switches.

I have opened a TAC case (614705211) to investigate why the switch ran at 100% utilistaion and the only lead I have so far is the UT process.

What I would like is an assurance that the MIB's being polled by ciscoworks could not be implicated in the switch maxing out, but I guess that's not going to happen....or is it?

Cheers

Actions

This Discussion