All four of our MDS 9509 switches CPU are running at 100%. I have tried to get some info from System Health but can't find anything. We don't have a syslog server. Is there anywhere else I can look? Or does anybody know why we should be seeing 100% all the time?
Firmware is 3.1(2a)
Check your logs on each switch "show log last xxx" where xxx is how many lines you want to see.
Are all of these hooked up together?
Have you got any debugs on?
No, they are split across two fabrics.
We have the default Debug on for Call-Home but no set Debug for System or Systemhealth.
which one should I set?
Here is the output from sh log last 100 "PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 201%$ Inte
rface fc2/20 is down (Link failure)" just that repeated over but they are not from today. They are over the last week, which I expect to see. No errors about CPU.
Try connecting a console cable, logging in, and then disconnecting the Eth0 cable on one of the MDS. What I have seen in the past is a storm of SNMP packets hitting Eth0 with some SNMP query from a monitoring station driving up the CPU.
If the CPU goes down, then you may want to collect a Wireshark trace on the mgmt0 network to identify the source of the SNMP queries.
Hope this helps,
Problem is without a console cable, how would you monitor the CPU usage if you disconnect the Eth0 management cable? If you disconnect the Eth0 cable, it would stop any SNMP queries to that interface, but you still need a way to check the CPU usage. I thought that console cables shipped with each unit. They are blue with an RJ45 on one end, and a D9 on the other end. The setting for HyperTerm, or other terminal emulator is speed 9600, 8 bits, parity none, and 1 stop bit. Flow control is off or non.
Where exactly are you seeing that the CPU switches are running at 100%?
It could possibly be the following bug:
Symptom: 100% CPU utilization was seen on an MDS switch. It was caused by repeated fabric
logins (FLOGIs) on a particular port. This situation can occur if a host cannot log in because the
allocation of the FC ID fails, and keeps re-trying using a specific pattern of Source FC IDs (S_IDs)
for the FLOGI frame.
Workaround: The interface will now be error-disabled for too many FLOGI failures.
To troubleshoot the configuration and find the reason for the FC ID allocation failure, examine the
messages in the syslog. Refer to the Cisco MDS 9000 Family CLI Configuration Guide for detailed
information about FLOGI, FC IDs, and FC ID allocation for HBAs.
I see this in Device Manager for all four switches which thinking about it could point to a Bug. We are about to upgrade the firmware to 3.3(1c) this weekend and I don't really want to upgrade if we really do have a problem but if it's a bug it might clear the issue.
Open a command line session to the MDS's and issue this command to see if the CPU 100% is seen via that method. Let' compare the command line display with the FM display.
show system resources
Also, please issue this command 3 times at about 10 second intervals.
show interface mgmt 0.
Please post all output back to the forum.
Thanks for the snap, but I think i confused you. I was looking to get the 'show int mgmt 0' command 3 times, not the 'show system resources'.
Here is my next suggestion. Determine what the number of packets input on the mgmt 0 interface.
Issue 'show int mgmt 0' then wait 10 seconds and issue it again. Now subtract the first dsiplay line 'packets input' (6th line from the top) from the second display output. This will give you a rough idea of how many packets came in that interface in 10 seconds. Divide by 10 to a packets per second number. Let us know how that number looks. Please check multiple MDS this way at different times to get a good average.
The Firmware upgrade to 3.3(1c) fixed the problem. After the upgrade on both switches in the same fabric the CPU went down to about 8-10% and has stayed there since. Must have been the firmware bug that somebody spoke about.
Thanks for all your replies.