cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
9170
Views
20
Helpful
3
Comments
Julie Burruss
Level 4
Level 4

Cisco TAC has put together a list of router issues that are commonly perceived to be due to a hardware failure, but can normally be fixed more quickly without replacing anything. While some of these may in fact be due to hardware failures, it doesn't usually make sense to start with that assumption. Save yourself some time by checking more common causes first.

#1 The system reports error messages mentioning memory, or an interface chipset. Most of these issues are due to software, or are merely informational. Use the system error message decoder  http://www.cisco.com/cgi-bin/Support/Errordecoder/index.cgi with your output.

#2 No response on the console, but the device appears to be working otherwise. Most of these problems are due to an incorrect cable, or incompatible terminal settings - especially flow control.

#3 The system restarts by itself (crashes). Some crashes are due to hardware, but the majority are software issues. Use the Output Interpreter https://www.cisco.com/cgi-bin/Support/OutputInterpreter/home.pl with the output of "show tech" as a first pass, or open a service request with TAC.

#4 A module or interface card is unrecognized. If the IOS version is too new, or the hardware is otherwise unsupported by the software, modules may not be seen. This is almost always a mismatch between the software required and the hardware installed.

#5 The configuration is lost upon a power cycle. This can happen if the config-register is not reset after doing a password recovery, as explained in this tech tip: http://www.cisco.com/en/US/products/hw/routers/ps233/products_tech_note09186a00800a65a5.shtml

In additional, failure to save the configuration after a change is also fairly common.

#6 Errors reported in show interface output. Most errors are either normal, but unusual conditions (drops, e.g.) or due to an external cause like incorrect clocking, noise, or even a marginal cable. Rule those out first.

#7 Wrong version loaded, or failure to completely boot. Again, the config-register can easily cause unexpected behavior. Check the config-register and boot system commands, and beware of relying on the system to boot the first image in flash.

Comments
Edward Swenson
Cisco Employee
Cisco Employee

What other situations have you seen that you thought were hardware issues and turned out not to be ?

mtimm
Cisco Employee
Cisco Employee
#3 The system restarts by itself (crashes). Some crashes are due to hardware, but the majority are software issues. Use the Output Interpreter https://www.cisco.com/cgi-bin/Support/OutputInterpreter/home.pl with the output of "show tech" as a first pass, or open a service request with TAC.

If the service request is opened with the Cisco TAC, I would recommend including the 'show tech' in the service request so the TAC engineer has it to begin troubleshooting immediately.

There are a fair number of service requests that get opened for "unable to access router, need replacement" where there is no out of band management capabilities.  10 years ago this was almost unheard of, but it seems that out of band management has become less of a requirement for some network administrators who may be looking for ways to save money.  It is impossible to tell if there is a real hardware problem or if the configuration register is simply set improperly and the device reset due to a software issue or if the flash has a corrupt image or if the configuration is gone for example without having someone onsite with the device who is technical enough to console in to it and check.

Palani Mohan
Cisco Employee
Cisco Employee

Use the Output Interpreter

https://www.cisco.com/cgi-bin/Support/OutputInterpreter/home.pl

with the output of "show tech" as a first pass, or open a service request with TAC.

This concerns me very much. Today's IOS and the networks where it is being deployed are very very complex environments. Many a time, we overlook the need to make the distinction between what is a problem that can impact my operations vs "imaginary" problems.

Use "Output Interpreter", which by itself is nothing wrong. But, don't loose sight of what "service interrupting" problem that led you to collect sh cmds which you ran through "Output Interpreter". If "Output Interpreter" recommends "Buffer Tuning", then that action may or may not (more often not) help you to address the service interruption.

In any network, there is always more than 1 device. You  don't go about proactively collect show tech from all of these devices and run its contents  through "Output Interpreter". You decided to use this tool called "Output Interpreter" to help you investigate some service interruption. When engaging TAC Engineers, please describe the "Service Interruption", share what was done by way of troubleshooting, your interpretation of the data you collected/analyzed etc.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Innovations in Cisco Full Stack Observability - A new webinar from Cisco