100% CPU

Unanswered Question
Sep 13th, 2008

All four of our MDS 9509 switches CPU are running at 100%. I have tried to get some info from System Health but can't find anything. We don't have a syslog server. Is there anywhere else I can look? Or does anybody know why we should be seeing 100% all the time?

Firmware is 3.1(2a)

Thanks.

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
inch Sun, 09/14/2008 - 15:18

G'day,


Check your logs on each switch "show log last xxx" where xxx is how many lines you want to see.


Are all of these hooked up together?


Have you got any debugs on?

user_4444 Sun, 09/14/2008 - 23:24

No, they are split across two fabrics.

We have the default Debug on for Call-Home but no set Debug for System or Systemhealth.

which one should I set?

Cheers.

user_4444 Tue, 09/16/2008 - 01:24

Here is the output from sh log last 100 "PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 201%$ Inte

rface fc2/20 is down (Link failure)" just that repeated over but they are not from today. They are over the last week, which I expect to see. No errors about CPU.



Michael Brown Mon, 09/15/2008 - 02:09

Try connecting a console cable, logging in, and then disconnecting the Eth0 cable on one of the MDS. What I have seen in the past is a storm of SNMP packets hitting Eth0 with some SNMP query from a monitoring station driving up the CPU.


If the CPU goes down, then you may want to collect a Wireshark trace on the mgmt0 network to identify the source of the SNMP queries.


Hope this helps,

Mike

user_4444 Tue, 09/16/2008 - 01:27

I don't have a console cable but would I see the same result if I just removed the Network cable from Eth0?

Michael Brown Wed, 09/17/2008 - 01:41

Problem is without a console cable, how would you monitor the CPU usage if you disconnect the Eth0 management cable? If you disconnect the Eth0 cable, it would stop any SNMP queries to that interface, but you still need a way to check the CPU usage. I thought that console cables shipped with each unit. They are blue with an RJ45 on one end, and a D9 on the other end. The setting for HyperTerm, or other terminal emulator is speed 9600, 8 bits, parity none, and 1 stop bit. Flow control is off or non.

KevinBeaulieu Mon, 09/15/2008 - 11:32

Where exactly are you seeing that the CPU switches are running at 100%?


It could possibly be the following bug:


CSCsi49231

Symptom: 100% CPU utilization was seen on an MDS switch. It was caused by repeated fabric

logins (FLOGIs) on a particular port. This situation can occur if a host cannot log in because the

allocation of the FC ID fails, and keeps re-trying using a specific pattern of Source FC IDs (S_IDs)

for the FLOGI frame.

Workaround: The interface will now be error-disabled for too many FLOGI failures.

To troubleshoot the configuration and find the reason for the FC ID allocation failure, examine the

messages in the syslog. Refer to the Cisco MDS 9000 Family CLI Configuration Guide for detailed

information about FLOGI, FC IDs, and FC ID allocation for HBAs.

user_4444 Tue, 09/16/2008 - 01:38

I see this in Device Manager for all four switches which thinking about it could point to a Bug. We are about to upgrade the firmware to 3.3(1c) this weekend and I don't really want to upgrade if we really do have a problem but if it's a bug it might clear the issue.

Thanks.

Michael Brown Wed, 09/17/2008 - 08:12

Open a command line session to the MDS's and issue this command to see if the CPU 100% is seen via that method. Let' compare the command line display with the FM display.


show system resources


Also, please issue this command 3 times at about 10 second intervals.


show interface mgmt 0.


Please post all output back to the forum.

Michael Brown Fri, 09/19/2008 - 04:30

Thanks for the snap, but I think i confused you. I was looking to get the 'show int mgmt 0' command 3 times, not the 'show system resources'.


Here is my next suggestion. Determine what the number of packets input on the mgmt 0 interface.


Issue 'show int mgmt 0' then wait 10 seconds and issue it again. Now subtract the first dsiplay line 'packets input' (6th line from the top) from the second display output. This will give you a rough idea of how many packets came in that interface in 10 seconds. Divide by 10 to a packets per second number. Let us know how that number looks. Please check multiple MDS this way at different times to get a good average.


Thanks,

Mike

user_4444 Mon, 09/22/2008 - 03:17

The Firmware upgrade to 3.3(1c) fixed the problem. After the upgrade on both switches in the same fabric the CPU went down to about 8-10% and has stayed there since. Must have been the firmware bug that somebody spoke about.

Thanks for all your replies.

Actions

This Discussion

 

 

Trending Topics: Storage Networking