High CPU utilization in Catalyst chassis-based switcfh

Unanswered Question
Aug 16th, 2010

Hello -

I'm showing > 85% CPU utilization on a Catalys 4506 switch running IOS software version 12.2(50) SG1. Here is a piece of the output from the 'sh proc cpu' command where the hightest percentage seems to be - the Cat4KMgmt process:

PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process
  45           0         2          0  0.00%  0.00%  0.00%   0 IP Host Track HA
  46           0         2          0  0.00%  0.00%  0.00%   0 ARP HA          
  47           0         1          0  0.00%  0.00%  0.00%   0 IP Admission HA 
  48   1672725443234148452         51 13.11% 14.33% 13.89%   0 Cat4k Mgmt HiPri
  49  39154089162238287380       1749 61.83% 59.51% 61.14%   0 Cat4k Mgmt LoPri
  50     1259004 149618759          8  0.00%  0.00%  0.00%   0 Galios Reschedul
  51           0         1          0  0.00%  0.00%  0.00%   0 IOS ACL Helper  
  52        4284    202167         21  0.00%  0.00%  0.00%   0 BACK CHECK      
  53           8         2       4000  0.00%  0.00%  0.00%   0 rf task         
  54           0         1          0  0.00%  0.00%  0.00%   0 RF High Priority
  55           0       160          0  0.00%  0.00%  0.00%   0 Net Input       
  56     9073052  12073706        751  0.00%  0.05%  0.07%   0 Compute load avg
  57           0         1          0  0.00%  0.00%  0.00%   0 CHKPT rcv MSG pr
  58           0         2          0  0.00%  0.00%  0.00%   0 cpf_process_msg_
  59           0         1          0  0.00%  0.00%  0.00%   0 cpf_process_ipcQ
  60           0         1          0  0.00%  0.00%  0.00%   0 AggMgr Process  
  61         176     11255         15  0.00%  0.00%  0.00%   0 Transport Port A
  62      731224  13454850         54  0.00%  0.00%  0.00%   0 HC Counter Timer
  63           0         1          0  0.00%  0.00%  0.00%   0 SFF8472         
  64           0         2          0  0.00%  0.00%  0.00%   0 Ethernet OAM Pro
  65           0         1          0  0.00%  0.00%  0.00%   0 PPPOE IA        
  66           0         2          0  0.00%  0.00%  0.00%   0 REP Topology cha
--More--

At present there are only a handful (less than a dozen) active users on this switch as I'm in process of re-patching the data closet. Anybody have an idea why this would suddenly occur when almost no one is using it?

Thanks!

Jon Koelker

Oyster River School District

Durham, NH

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
jon.koelker Mon, 08/16/2010 - 11:05

No problem, Mike - thanks for the response! And thanks for the command tip, too.

              jk

jon.koelker Mon, 08/16/2010 - 11:23

Hey, guys -

I checked out the link you said and it gave me some helpful troubleshooting tips. Unfortunately, I'm not sure of how to interpret what I'm seeing in some of it. For instance, in the 'sh platform health' output, I can see these high percentages:

K2CpuMan Review       30.00  63.09     30    117  100  500   83  83   55  41493:31
K2AccelPacketMan: Tx  10.00  12.52     20      1  100  500   14  14   10  14752:51

But I don't know that those processes are and what would cause them to suck up a lot of CPU cycles. Any ideas?

Also, it doesn't appear that spanning tree is the problem based on this output:

HSSW02#sh proc cpu | e 0.00
CPU utilization for five seconds: 89%/7%; one minute: 88%; five minutes: 87%
PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process
  36   620773720  87669413       7080  1.67%  1.66%  1.67%   0 IDB Work        
  39    36305440  48474019        748  0.15%  0.10%  0.10%   0 Per-Second Jobs 
  48   1675360643234360965         51 14.00% 13.80% 13.86%   0 Cat4k Mgmt HiPri
  49  39173458482240582550       1748 65.11% 63.70% 63.27%   0 Cat4k Mgmt LoPri
  79     3637832  52735083         68  0.07%  0.04%  0.05%   0 UDLD            
  90    19061564 122895740        155  0.07%  0.11%  0.11%   0 CDP Protocol    
103   220095552 107861677       2040  0.23%  0.24%  0.23%   0 Spanning Tree   
138         332       216       1537  0.47%  0.07%  0.05%   1 SSH Process     
197     72207681352374523          5  0.07%  0.07%  0.07%   0 RADIUS          
HSSW02#

So I don't think I have a data loop issue.

Also, when I ran packet statistics from the sh platform cpu health command, I see this:

Total packet queues 16

Packets Received by Packet Queue

Queue                  Total           5 sec avg 1 min avg 5 min avg 1 hour avg
---------------------- --------------- --------- --------- --------- ----------
Esmp                        7629336606       234       239       202        190
L2/L3Control                 749818192      1024      1041       880        745
Host Learning                  8698879         0         0         0          0
L3 Fwd Low                     9620901         3         0         0          0
L2 Fwd Low                      542793         0         0         0          0

Looks like a lot of control packets (BPDUs?) are being forwarded for processing. Does that seem right to you?

Thanks again for your help!!

                                    jk

manish arora Mon, 08/16/2010 - 11:32

can you please post the complete output of  "sh platform cpu packet statis " ?

thanks

Manish

jon.koelker Mon, 08/16/2010 - 11:40

Sure - here it is. Thanks again!

HSSW02#sh platform cpu packet statistics
Packets Dropped In Hardware By CPU Subport (txQueueNotAvail)

CPU Subport  TxQueue 0       TxQueue 1       TxQueue 2       TxQueue 3
------------ --------------- --------------- --------------- ---------------
           0               0               0               0          537136
           2               0              24               0               0


RkiosSysPacketMan:
Packet allocation failures: 0
Packet Buffer(Software Common) allocation failures: 0
Packet Buffer(Software ESMP) allocation failures: 0
Packet Buffer(Software EOBC) allocation failures: 0
Packet Buffer(Software SupToSup) allocation failures: 0
IOS Packet Buffer Wrapper allocation failures: 0

Packets Dropped In Processing Overall

Total                5 sec avg 1 min avg 5 min avg 1 hour avg
-------------------- --------- --------- --------- ----------
             2272354         0         0         0          0

Packets Dropped In Processing by CPU event

--More--
Event             Total                5 sec avg 1 min avg 5 min avg 1 hour avg
----------------- -------------------- --------- --------- --------- ----------
Input Acl                      2270909         0         0         0          0
SA Miss                           1373         0         0         0          0

Packets Dropped In Processing by Priority

Priority          Total                5 sec avg 1 min avg 5 min avg 1 hour avg
----------------- -------------------- --------- --------- --------- ----------
Normal                             808         0         0         0          0
Medium                            1445         0         0         0          0
High                           2270173         0         0         0          0

Packets Dropped In Processing by Reason

Reason             Total                5 sec avg 1 min avg 5 min avg 1 hour avg
------------------ -------------------- --------- --------- --------- ----------
SrcAddrTableFilt                   1358         0         0         0          0
L2DstDrop                            29         0         0         0          0
NoDstPorts                          381         0         0         0          0
NoFloodPorts                    2270586         0         0         0          0

Total packet queues 16

--More-- Total packet queues 16

Packets Received by Packet Queue

Queue                  Total           5 sec avg 1 min avg 5 min avg 1 hour avg
---------------------- --------------- --------- --------- --------- ----------
Esmp                        7629520214       217       233       202        190
L2/L3Control                 750603037       956      1066       882        764
Host Learning                  8698961         0         0         0          0
L3 Fwd Low                     9621193         1         0         0          0
L2 Fwd Low                      542809         0         0         0          0

Packets Dropped by Packet Queue

Queue                  Total           5 sec avg 1 min avg 5 min avg 1 hour avg
---------------------- --------------- --------- --------- --------- ----------
L2/L3Control                   1571521         0         0         0          0
Host Learning                    32031         0         0         0          0

manish arora Mon, 08/16/2010 - 12:00

from the output it appears the max no. of cpu cycles are used for stopping the stp loop. you should check you design to see if you have caused a loop somewhere in you lan as you mentioned earlier that you were in the middle of making cable changes. enable various gaurds for STP and look diagnose further into the stp states on the switches.

Thanks

Manish

rahurao Mon, 08/16/2010 - 12:11

Hi Jon,

From the output these 2 processes are using up your CPU the most:

48   1675360643234360965         51 14.00% 13.80% 13.86%   0 Cat4k Mgmt HiPri
49  39173458482240582550       1748 65.11% 63.70% 63.27%   0 Cat4k Mgmt LoPri

Checked the same and found some bugs:

http://cdetsweb-prd.cisco.com/apps/goto?identifier=CSCsy32312

CSCsy32312    Cat4k Mgmt LoPri - High CPU utilisation bcos of K5L3Unicast Adj Tabl

Not sure if the bugs matches your IOS.

This traffic needs to be checked when the CPU is high. You can use an inbuilt CPU debug command and it is SAFE to use even when CPU is high.
Switch#debug platform packet all receive buffer ------------> Enable debug
command

Switch#show platform cpu packet buffered ---------------> Packets that were captured after enabling the debugs.


Switch#debug platform packet all count --------------> Enable debug command to enable packet count per interface.

Switch#show platform cpu packet statistics ----------> This provides packet received on per interface basis.



With the set of above two debugs, conclusively we can identify the CPU intended traffic that is seen most on a specific interface and the source of that traffic. So if the CPU is high, it will give us some data statistics ofthe source of traffic.

Reference links:

http://www.cisco.com/en/US/products/hw/switches/ps663/products_tech_note09186a00804cef15.shtml#tool2

HTH

Rahul

jorge.calvo Tue, 08/17/2010 - 01:26

Hello,

Those two proccesses indicate a lot of traffic is being software forwarded by the CPU.

Could you please enable "terminal monitor" on the switch and look for:

1. %C4K_EBM-4-HOSTFLAPPING messages on the log

2. Broadcast traffic level on the switch interfaces.

Cheers

Actions

This Discussion