cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
11760
Views
10
Helpful
0
Comments
Xavier Hick
Level 1
Level 1

     

    Introduction

    The intent of this document is to explain in a simple manner the cause of the most common types of crashes and typical actions to take for resolving them.

    False crashes

    System returned to ROM by reload

    Cause: the router was reloaded using the ‘reload’ command.

    Solution: ensure the right persons can execute such command.

    System returned to ROM by power-on

    Cause: the router encountered a power outage, was manually power cycled or its power supply became deficient.

    Solution: verify the status of the power supply with the 'show environment' and the syslogs (a failing power supply will typically be reported) and if failing, replace it. If the power supply works properly, then the cause is one of the two first possibilities.

    System returned to ROM by abort

    Cause: the router received a BREAK signal through the console.

    Solution: verify the Configuration register from the 'show version'. If it is set to a value of 0xA0BC (A,B,C being any value - example: 0x2002), the configuration register needs to be changed to prevent accepting the BREAK signal except at bootup time. To do so, go to configuration mode and change its value to 0xA1BC (typical value being 0x2102) then reload the router to apply this value

     

    Router#conf t

    Router(config)#config-register ?

    <0x0-0xFFFF> Config register number

    Parity crashes : System returned to ROM by processor memory parity error or Crash due to "Cause = 0x20"

    Cause:

    1) If no parity error was seen within the previous month, the crash is due to a transient failure due to cosmic radiation, involving a bit swap in router memory.

    2) If there was more than one parity error within the last month, the crash is due to a hardware failure from the route processor.

    Solution:

    1) Monitoring the router for the upcoming month. If no further parity error is seen, no further action is to be taken. If another parity error is seen, replace the route processor (don't forget to replace the possible memory upgrades as well).

    2) Replace the route processor directly (don't forget to replace the possible memory upgrades as well).

    Bus Error

    Cause: the processor tries to access a memory location that either does not exist (a software error) or does not respond properly (a hardware problem).

     

    You can find the memory location by looking at the address in the  output of the show version command, as shown in this example:

     

     

    Router#show version
    Router uptime is 2 days, 21 hours, 30 minutes
    System restarted by bus error at PC 0x30EE546, address 0xBB4C4

     

    With the address accessed by the router when the bus error occurred,  determine the memory location that the address corresponds to by issuing  the show region command.

     

    Solution:

    1) If the address reported by the bus error does not fall within the  ranges displayed in the output of the show region command, the  router was trying to access an address that is not valid. This indicates the router faces a software problem. The first step towards the resolution is to upgrade the IOS software to the last version. If the crash is still seen afterwards, opening a TAC case would be the second step.

     

    2) If the address falls within one of the ranges in the show region command output, it means that the router was accessing a valid memory  address, but the hardware corresponding to that address is not  responding properly. The next action is to replace the problematic hardware, being most of the time the route processor.</p>

    Illegal Opcode & Sigtrap Exception

    Cause: IOS software failure

    Solution: 1st step is to upgrade the IOS software to the last version, verify if the crash still happens and if so, open a TAC case.

    Watchdog timeout

    1) When no crashinfo file is generated and the message *** Watch Dog Timeout *** is seen on the logs, the cause is most probably hardware. The solution is to replace the route processor.

    2) When the message "Process aborted on watchdog timeout" is seen on the logs, and the crashinfo mentions a Software forced crash, the cause is an IOS software problem.

    Software Forced Crash

    Cause: IOS software failure, typically a corruption happening in the memory

    Solution: 1st step is to upgrade the IOS software to the last version, verify if the crash still happens and if so, open a TAC case.

    Getting Started

    Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: