High CPU usage on 4500

Unanswered Question
Dec 23rd, 2008

Hello everyone,

Hoping someone may be able to help me track down this slight issue. Starting two weeks ago, one of our 4507R went from an average of 45-50% usage to being pegged at 95-98%.

We really hadn't noticed it until we started looking at the graphs. Performance still seems to be ok. There are no packet loss while connected to the system, users aren't complaining, etc...

While troubleshooting this issue, I've implemented some of the recommendation with reguards to "no ip redirects", "no ip unreachables" on all our SVIs. I found the document which describes the basic troubleshooting process on the 4500 but to still no avail.

I've attached the outputs of the "show platform health", "show proc cpu", "show spanning-tree summary totals".

I've also done a span of the traffic going to the CPU with wireshark and it's mostly UDP traffic going through. I do see some multicast traffic coming from our Call Managers.

Any thoughts on where else I could look for this strange issue.

Thanks.

-- Dominique

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (1 ratings)
Loading.
Dominique Demore Tue, 12/23/2008 - 08:23

Hi Mike,

Thanks for the response, yes I did perform the recommended monitor of the CPU to a span port. What I saw was a bunch of UDP traffic (part our video encoding/decoding). There other packets that have been seen are our STP instances. There are however only 333 STP instances running in PVST+.This number is under the max limit for STP. (unless I'm mistaken)

The processes that are running the highest are the following:

K2CpuMan Review 30.00 63.45 30 45 100 500 76 82 64 77691:40

K2AccelPacketMan: Tx 10.00 25.21 20 1 100 500 27 29 22 14754:05

These 2 make up 88.66% of the Total CPU usage.

I've attached the "show platform cpu stat" and an output of the show platform cpu buff"

burleyman Tue, 12/23/2008 - 08:37

I had a simular issue on my 6500 switch I did the following. Post the outputs.

Issue the following commands to identify kinds of traffic being sent to Route Processor (CPU) for software switching instead of hardware switching:

- "show interface | inc line|size/max/drop|minute" <- look for interfaces with odd input queue behavior and large number of output drops

After identifying one or two interfaces, issue the following command to see what's waiting in the buffers to be punted to the CPU. Look for patter, specifically make note of:

1) Source IP Address

2) Destination IP Address

3) TTL (e.g. TTL=1)

- "show buffer input-interface [vlan/interface] packet"

- "show buffer input-interface [vlan/interface] header"

Mike

Dominique Demore Tue, 12/23/2008 - 11:25

From looking at the "show interface | inc line|size/max/drop|minute", there doesn't seem to be anything strange with it. There are nbo output drops, they are all 0.

SVIs are 0/75/0/0

Physical are 0/2000/0/0

I've attached the output of the show command for your review.

Thanks.

Dominique Demore Tue, 12/23/2008 - 08:25

Hi Edison,

Nope, no Access Points have been added. The version I'm running is:

>show ver

Cisco Internetwork Operating System Software

IOS (tm) Catalyst 4000 L3 Switch Software (cat4000-I9S-M), Version 12.2(20)EW, EARLY DEPLOYMENT RELEASE SOFTWARE (fc1)

Technical Support: http://www.cisco.com/techsupport

Copyright (c) 1986-2004 by cisco Systems, Inc.

Compiled Wed 02-Jun-04 18:48 by hqluong

Image text-base: 0x00000000, data-base: 0x011F88F4

ROM: 12.1(12r)EW

Dagobah Revision 93, Swamp Revision 28

ON-SUD0-C4507R-001 uptime is 24 weeks, 7 hours, 42 minutes

System returned to ROM by reload

System restarted at 04:44:55 EDT Tue Jul 8 2008

System image file is "bootflash:cat4000-i9s-mz.122-20.EW.bin"

cisco WS-C4507R (MPC8245) processor (revision 7) with 524288K bytes of memory.

Processor board ID FOX0728017K

MPC8245 CPU at 333Mhz, Supervisor IV

Last reset from Reload

29 Virtual Ethernet/IEEE 802.3 interface(s)

176 FastEthernet/IEEE 802.3 interface(s)

52 Gigabit Ethernet/IEEE 802.3 interface(s)

403K bytes of non-volatile configuration memory.

Configuration register is 0x2102

The second 4507R which is in parallel to this one is running an identical setup. It's running at the constant 45-50%

Thanks.

Edison Ortiz Tue, 12/23/2008 - 08:56

You've mentioned the trace showed lots of UDP traffic which are video traffic.

If you terminate that traffic, does the CPU goes back to normal level?

Any reason why this traffic is hitting the CPU and it isn't hardware switched?

__

Edison.

Dominique Demore Tue, 12/23/2008 - 10:22

I'd like to turn this traffic off however, if I do, I'll have alot of TV customers upset at me :).

As for why this traffic isn't being hardware switched, I'll need to investigate that reason. I don't think there would be a reason it would need to be sent through the CPU.

peterbe Tue, 12/23/2008 - 17:22

Do you access-lists? Check TCAM to make sure they are running in hardware.

show platform software acl input summary interface partial

show platform hardware acl statistics utilization brief

burleyman Wed, 12/24/2008 - 04:02

I am at the end of what I know. Like I said I just had a simular issue on my 6509's. and the commands I had you check helped me find my issue which was a PC was causing a broadcast storm and the switches CPU spiked when trying to process all that traffic. If you have any type of support you might try and open a TAC case. Also, like an earlier post said it could be a bug. I will keep picking at my brain. Also go over any changes that may have occured in the enviroment even as small as adding a PC, like in my case.

Mike

Dominique Demore Tue, 12/30/2008 - 07:53

Hi Everyone,

Just wanted to send an update. Earlier this morning we seemed to have solved this issue. It ended up being a routing loop that was causing this issue.

I would like to thank everyone for the help in troubleshooting this issue.

Thanks.

-- Dominique

Dominique Demore Tue, 12/30/2008 - 08:09

A static route between our two 4507R. If you remember about the UDP traffic I mentioned earlier. It should have clued me in more into the issue. For some reason, there was a static route placed on the First 4507 (high CPU one) to point to the second 4507.

Since the second one didn't have any routing information for the network in question, it was trying to send it back to the first 4507 as it's default gateway.

I'm not sure why it was ever built that way.

Actions

This Discussion