Ask the Expert:Troubleshooting Tools to Analyze High CPU Utilization Issues on Cisco Catalyst 6500 Series Switches

Jan 17th, 2012
With Souvik Ghosh

Welcome to the Cisco Support Community Ask the Expert conversation. This is an opportunity to learn and ask questions about different methods of advanced troubleshooting tools to debug high CPU utilization issues on Cisco Catalyst 6500 Series Switches with Cisco Expert Souvik Ghosh. You can ask questions on troubleshooting issues running in native mode with Cisco Catalyst 6500 Series Supervisor Engine 720 or Cisco Catalyst 6500 Supervisor Engine 32, Cisco Catalyst 6500 running in native mode with Cisco Catalyst 6500 Series Supervisor Engine 2, and Cisco Catalyst 6500 running in hybrid mode.  Souvik Ghosh is a customer support engineer at the Cisco Technical Assistance Center in Bangalore, India. He has three and half years of experience in LAN switching technologies. LAN switching products such as the Cisco Catalyst 6500, 4500, 3750, and 2960 Series Switches are his areas of expertise. He has been involved in various escalation requests from India, Singapore, and Australia and is currently working as a technical lead for the LAN switching team in Bangalore, India. He holds CCNP and CCIP certifications.

Ivan Petrov Thu, 01/19/2012 - 13:47
Hi Souvik,

I am having a problem on high CPU being shown on my 7600 router due to IP Input. Do you have any pointer to a document that can help me troubleshoot this? I am running 12.4 IOS.

Also, can you tell me under which circumstances we the IP input would how high percentage of CPU usage?



Souvik Ghosh Thu, 01/19/2012 - 17:55
Hi Ivan,

High CPU due to ip input is probably because of normal data packets hitting the CPU. you can collect the output of "debug netdr cap rx" followed by "sh netdr cap" ( commands are safe to run in production network ) and find out what packets are hittinf the CPU. Try finding out a trend in those packets.In case you need further assistance please attach the "show netdr cap " otuput alongwith the "show tech" output from the switch.

BTW are you sure you are running 12.4 on 7600 ?



Ivan Petrov Fri, 01/20/2012 - 11:34
Thank you for your advice. I will try this. Indeed I meant to say 12.2SRB version.



burleyman Fri, 01/20/2012 - 12:32
I have 4500's and 6500's in my environment and would like to ask a few things.

What would you consider a normal range for the CPU utilization to run in?

At what level of utilization should you start to take some preventive action to prevent further degradation?

Now lets say I have a switch that all of a sudden starts running at 80% to 100% for a extended period, what are some of the first commands I can run to help find the problem? And what are some debug commands I can run that will not harm the flow of data on production switches? and what are some commands I should NOT run during production hours?

Thanks Mike

Souvik Ghosh Fri, 01/20/2012 - 18:37
Hi Mike,

the normal average cpu utilization of the 4500 switches is around 30-40% and that of 6500 is between 0-15%. However the CPU utilization will depend on the nature of applications your network is supporting. There could be few application traffic which cannot be forwarded in the hardware of the supervisor and needs software forwarding. In case the average CPU utilization is above the 15% on a 6500 switch then find out what traffic or process is consuming CPU cycles, if those traffic are required in the network then that would be your benchmark CPU utilization.

Now if you that the average CPU utilization is way more than your benchmark CPU utilzation then that is the time you would need to troubleshoot the cause. The tools which you have to troubleshoot CPU utilization problem depends on the SUP and the OS which you are running. The tools  are discussed in the PDF available in the following link.



Souvik Ghosh Sat, 01/21/2012 - 18:01
Hi Slava,

I am not sure if this the right forum to answer your question. Please post your question in "security" forum.



burleyman Mon, 01/23/2012 - 05:48
Here is the info for my SUP.

Supervisor Engine 720 10GE (Active)    VS-S720-10G

MSFC3 Daughterboard         VS-F6K-MSFC3

MSFC3 Daughterboard         VS-F6K-MSFC3

I looked through the document and it was good but could you explain what you would do first when you see the CPU go above the baseline.

What commands would you run and what would I look for to help find the problem.

What debug commands would be helpful and what do I look for in the output?

What debug commands can be run during production and which should you not run till after hours?

If I span the CPU what should I look for to find the problem?


Souvik Ghosh Mon, 01/23/2012 - 22:36
Hi Mike,

Since you have SUP720 in your 6500 chassis you have more options to troubleshoot a high CPU utilization issue as compared to older SUPs. here are the steps which you can try in order to start troubleshooting the problem.

-> Issue the command "sh proc cpu history" and find out what is the average CPU utilization and since when the CPU utilization is high. Try correlating with some recent changes in the network.

-> if the average CPU utilization is above the benchmark issue the command "show proc cpu sort | e 0.00" and look the line which talks about cpu utilization. you will see something like this.

Switch#show proc cpu sort | e 0.00

CPU utilization for five seconds: 17%/10%; one minute: 18%; five minutes: 18%

Here 17% is the total CPU utilization and 10% is the utilization due to interrrupt switching. In the above output CPU utilization due to process switching is 17-10=7%. Here is the difference between process switching and interrupt switching.

Process switching - CPU usedby IOS processes like "eigrp process", "ospf process" etc.

Interrupt switching- CPU used to forward normal data packets.

If the CPU utilization is due to IOS process then the troubleshooting is specific to that process, like in case it is eigrp process which is consuming the CPU cycles then you need to check the routing protocol and see if there is a routing loop or there is some eigrp neighbor in SIA state etc. If the PCU utilization is because of interrrupt switching then we need to capture packets which are hitting the CPU and find out a trend in those packets like src IP, dst IP, src interface, src mac etc. following steps will help you to capture those packets hitting the CPU.

-> debug netdr cap rx << this is safe to run in production network from 12.2(18)SXF code and later.

-> show netdr cap << to see the packets punted to the CPU.

-> you can also take an inband CPU span to find out what packets are punted to the CPU. this is also safe to run int the network

-> Issue the command " show interface | i line|drops" and find out if there are any "input queue" drops for any interface. Interfaces with input queue drops are the ones which are sending packets towards the CPU. Input queue is the software queue where the packet waits before it can be processed by the CPU. In case there are more packets which are waiting the in the queue than what the queue can handle then we start tail dropping the packets.

You can find more detail regarding your quesiton in the webcast video recording which will be uploaded shorlty in our supportforum.



Francisco Macias Wed, 01/25/2012 - 09:53
Hi Souvik,

I wonder what can cause the Per Minute Job counters go up?

I have the following from my 6500:

switch#sh proc cpu sorted

CPU utilization for five seconds:  50%/12%; one minute: 28%; five minutes: 30%

PID Runtime(ms)   Invoked      uSecs    5Sec   1Min   5Min TTY Process

  38   374422416   1732046     219083 29.47%   3.53%  2.31%   0 Per-minute Jobs 

123  29738384121937605086       9234   4.39%  5.37%  5.48%   0 IP Input        

- Paco

DCASW01#sh proc cpu sorted

CPU utilization for five seconds: 53%/15%;  one minute: 30%; five minutes: 27%

PID Runtime(ms)   Invoked      uSecs   5Sec   1Min    5Min TTY Process

  38   374549292   1732206     216240 29.12%   3.54%  2.38%   0 Per-minute Jobs 

Souvik Ghosh Wed, 01/25/2012 - 22:53
Hi Francisco,

"Per-minute job" is a background process which runs in cisco switches and routers and performs the following tasks once a minute:

analyzes stack usage

announces low stacks

executes registered one_minute jobs

Do you see constant high CPU utilization on the "Per-minute job" process ? could you please provide the output of "show proc cpu hist" command. what is the version of the code which you are running? what is the memory utilization ? How many routes do you have on this switch.



parulpatel6 Thu, 01/26/2012 - 13:46
I need to know what are the recomended uptimes for all the routers and switches models. What is manufactured recomended reboot period? Please provide an answer or guide me to the right resource who could provide me this details.

Thank you


