cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3239
Views
10
Helpful
11
Replies

High CPU on Catalyst 6509

v.c.bodenstab
Level 1
Level 1

Hi all,

We are using a couple of 6509's on our distribution layer. Since about 48 hrs, one of these systems is generating HighUtilization alerts in our Ciscoworks LMS 3.0 (specifically DFM 3.0.2). The alerts indicate that one of the CPU's is pretty busy (97% all the time). DFM does not clearly state which CPU this is. So, I've been doing some troubleshooting with NET-SNMP. These are the commands I've been running:

$ snmpwalk -v3 -a SHA -A xxx -u xxx -l authNoPriv -E 800000090300000BBE574F01 cat6509sw enterprises.9.9.109.1.1.1.1

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.1 = INTEGER: 3017

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.2 = INTEGER: 3001

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.3 = INTEGER: 0

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.3.1 = Gauge32: 0

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.3.2 = Gauge32: 10

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.3.3 = Gauge32: 97

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.4.1 = Gauge32: 1

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.4.2 = Gauge32: 7

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.4.3 = Gauge32: 97

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.5.1 = Gauge32: 1

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.5.2 = Gauge32: 7

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.5.3 = Gauge32: 97

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.6.1 = Gauge32: 0

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.6.2 = Gauge32: 10

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.6.3 = Gauge32: 97

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.1 = Gauge32: 1

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.2 = Gauge32: 7

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.3 = Gauge32: 97

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.8.1 = Gauge32: 1

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.8.2 = Gauge32: 7

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.8.3 = Gauge32: 97

As you can see, one of the CPU's is at 97%. Now, I want to find out which. The first two are no problem:

$ snmpwalk -v3 -a SHA -A xxx -u xxx -l authNoPriv -E 800000090300000BBE574F01 cat6509sw entPhysicalDescr.3017

ENTITY-MIB::entPhysicalDescr.3017 = STRING: CPU of Routing Processor

$ snmpwalk -v3 -a SHA -A xxx -u xxx -l authNoPriv -E 800000090300000BBE574F01 cat6509sw entPhysicalDescr.3001

ENTITY-MIB::entPhysicalDescr.3001 = STRING: CPU of Switching Processor

So far, so good. What about the third one?

$ snmpwalk -v3 -a SHA -A xxx -u xxx -l authNoPriv -E 800000090300000BBE574F01 cat6509sw entPhysicalDescr.0

entPhysicalDescr.0: Unknown Object Identifier (Index out of range: 0 (entPhysicalIndex))

So, I've got one CPU (probably the one on the PFC3 on our SUP720 board), which is showing 97% usage. But it's entPhysicalIndex is 0? I don't see this on our other 6509's. Furthermore, how can I investigate what this CPU is so busy with. A 'sh proc cpu' doesn't help, because this shows me the info from the MSFC CPU. BTW: I don't have any noticable networkproblems.

Any tips are much appreciated! Cheers,

Vincent

1 Accepted Solution

Accepted Solutions

Hi Vincent,

At the risk of going slightly off topic, I'd just like to add a little something here. The supervisor has two CPUs on board - one on the route processor (RP or MSFC) and one on the switch processor (SP or PFC). When you issue a 'show proc cpu' in IOS on the 6k you are pulling the CPU stats from the RP, which handles most things we would ordinarily thing about: routing protocol updates, process-switched packets, telnet/SSH sessions, and higher-level functions of the device.

To pull the CPU stats from the SP you'd issue a 'remote command switch show process cpu'. The SP CPU is responsible for the low level goings-on in the switch. It handles IGMP snooping, spanning-tree BPDU processing, etc. It's also responsible for keeping in touch with the line cards via the EOBC (ethernet out-of-band channel).

Getting back on track, in your case it looks like you identified the overutilized CPU to be on the WiSM module. fw_lcp (firmware line card protocol) is the process responsible for communicating via the EOBC with the supervisor. It looks like the WiSM was having trouble communicating with the sup via LCP. Why this happened in the first place is anybody's guess if you can't reproduce it, but hopefully now you'll have a better grasp of what originally happened!

View solution in original post

11 Replies 11

Collin Clark
VIP Alumni
VIP Alumni

Post the results of the following command.

sh proc cpu | e 0.00% 0.00% 0.00%

This will show all processes that are not a 0.

Hi,

Thanks for the reply. This is the output of the command. Looks to me it's not particularly busy...

mu-6509-1#sh proc cpu | excl 0.00% 0.00% 0.00%

CPU utilization for five seconds: 1%/0%; one minute: 1%; five minutes: 1%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

5 137233748 11129235 12330 0.00% 0.26% 0.20% 0 Check heaps

8 30809012 132392063 232 0.07% 0.03% 0.02% 0 ARP Input

37 22511144 1138680 19769 0.00% 0.03% 0.00% 0 Per-minute Jobs

59 190024 65971217 2 0.07% 0.00% 0.00% 0 Heartbeat Proces

122 86232064 404382109 213 0.15% 0.07% 0.06% 0 IP Input

173 37231752 20394709 1825 0.07% 0.03% 0.00% 0 IPC LC Message H

177 19292156 98107973 196 0.07% 0.02% 0.01% 0 CEF process

178 62432544 201714904 309 0.00% 0.02% 0.02% 0 SNMP ENGINE

275 2914381882438835553 119 0.00% 0.33% 0.35% 0 Port manager per

316 72 290 248 0.55% 0.09% 0.02% 1 Virtual Exec

Do you see anything strange?

Best,

Vincent

Vincent-

Taking a look at this, the CPU is not running at 97%. What OID are you using to get CPU utilization?

Hi,

I'm using the following command from a Linux box to get the values:

snmpwalk -v3 -a SHA -A xxx -u xxx -l authNoPriv -E 800000090300000BBE574F01 cat6509sw enterprises.9.9.109.1.1.1.1

The OID's I'm getting from the switch are:

enterprises.9.9.109.1.1.1.1.1

enterprises.9.9.109.1.1.1.1.2

enterprises.9.9.109.1.1.1.1.3

for the three CPU's.

I show enterprises.9.9.109.1.1.1.1.1 as unsupported, enterprises.9.9.109.1.1.1.1.2 has physical index values, and enterprises.9.9.109.1.1.1.1.3 is 5s intervals (which can cause increased load). Try the following and see what you get

1 Minute

enterprises.9.9.109.1.1.1.1.4

5 Minute

enterprises.9.9.109.1.1.1.1.5

Hi all,

I've been doing some extra research on the 6500 chassis last night. I finally found out which CPU was causing the problems.

remote command module 8 show proc cpu

showed me that our WiSM (specifically the CFC) was busy at 97%. The fw_lcp was quite busy :-)

After a powercycle of the module, the CPU usage returned to normal.

Thanks for your help!

cheers,

Vincent

Hi Vincent,

At the risk of going slightly off topic, I'd just like to add a little something here. The supervisor has two CPUs on board - one on the route processor (RP or MSFC) and one on the switch processor (SP or PFC). When you issue a 'show proc cpu' in IOS on the 6k you are pulling the CPU stats from the RP, which handles most things we would ordinarily thing about: routing protocol updates, process-switched packets, telnet/SSH sessions, and higher-level functions of the device.

To pull the CPU stats from the SP you'd issue a 'remote command switch show process cpu'. The SP CPU is responsible for the low level goings-on in the switch. It handles IGMP snooping, spanning-tree BPDU processing, etc. It's also responsible for keeping in touch with the line cards via the EOBC (ethernet out-of-band channel).

Getting back on track, in your case it looks like you identified the overutilized CPU to be on the WiSM module. fw_lcp (firmware line card protocol) is the process responsible for communicating via the EOBC with the supervisor. It looks like the WiSM was having trouble communicating with the sup via LCP. Why this happened in the first place is anybody's guess if you can't reproduce it, but hopefully now you'll have a better grasp of what originally happened!

Hi,

Thanks for the clarification and the explanation of the architecture. I'll be extra alert on monitoring this system for the next couple of months or so.

Cheers,

Vincent

Great info Ryan, thanks.

Wilson Samuel
Level 7
Level 7

Hi,

Have you enabled SNMPv2 on the switch? I have had encountered once with Sup 720 which after enabling SNMP v2 had pegged the CPU utilization.

If its feasible for you, please turn off the SNMP on the switch and see if it does help.

HTH,

Please rate all helpful posts.

Kind Regards,

Wilson Samuel

Hi Wilson,

I'm only using SNMPv3 to monitor this device. Furthermore, I've looked at our TACACS+ logging and haven't found any records that indicate that something has changed in the config. The strangest thing is, the device just started running on 97% by itself.

Vince

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card