Cisco Support Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Announcements

Welcome to Cisco Support Community. We would love to have your feedback.

For an introduction to the new site, click here. And see here for current known issues.

New Member

Urgent !!! Catalyst 6506-E crashed

Hi,

 

Recently our Catalyst 6506-E crashed and reload automatically.

Last reload reason : bus error at PC 0x428AE718, address 0x0

The IOS version is 12.2(33)SXJ6, and attached are the crashinfo file. Please kindly help to analysis.

 

Many Thanks,

Jackson Ku

1 ACCEPTED SOLUTION

Accepted Solutions
Cisco Employee

Jackson,After reviewing the

Jackson,

After reviewing the show tech and crash info files, it looks like the RP crashed due to a parity error. A parity error will occur when a binary bit flips in values from a 0 to a 1 or vice versa and could be attributed to some kind of environment issue causing a fluctuation in electrical pulses such as background radiation (such as neutrons from cosmic rays), electromagnetic interference (EMI), or electrostatic discharge (ESD). If this is the first time occurrence, it is advisable to monitor the SUP for 48 hours to ensure that it doesn't reoccur as in most parity errors tend to be a transient/one-time occurrence and parity error stemming from faulty hardware will usually reoccur within this time frame. If the SUP is to experience another parity error, it is advisable to replace it as it could indicate that the hardware is defective

As you know the sup module would have RP and SP processor. RP which takes care of Routing process were in SP is responsible for switching part.

RP Crash info:
===========

 

Oct 17 14:24:00: %SYSTEM_CONTROLLER-3-ERROR: Error condition detected: TM_NPP_PARITY_ERROR
Oct 17 14:24:00: %SYSTEM_CONTROLLER-3-FATAL: An unrecoverable error has been detected. The system is being reset.

%Software-forced reload

 Early Notification of crash condition..

 14:24:00 TWN Fri Oct 17 2014: Breakpoint exception, CPU signal 23, PC = 0x428AE718


Explanation:

The most common errors from the Mistral ASIC on the Multilayer Switch Feature Card (MSFC) are TM_DATA_PARITY_ERROR, SYSDRAM_PARITY_ERROR,
SYSAD_PARITY_ERROR, and TM_NPP_PARITY_ERROR. The possible causes of these parity errors are random static discharge or other external factors.

Parity Errors are of two kinds:
.         Soft parity errors - these occur when an energy level within the chip (for example, a one or a zero) changes - When  referenced by the CPU, they cause the system to either crash or they recover. In case of a soft parity error, there is no need to swap the board or any of the components as they are generally Single Event Upsets (SEU).

.         Hard parity errors - these occur when there is a chip or board failure that causes data to be corrupted (not bad all or most of the time). In this case, you need to re-seat or replace the affected component, usually a memory chip swap or a board swap. We say that there is a hard parity error when we see multiple parity errors at the same address. There are more complicated cases which are harder to identify but, in general, if we see more than one parity error in a particular memory region in a relatively short period of time, this may be considered as a hard parity error.

Action plan:
=======
As this is the first occurrence this could be a transient issue. I suggest that we monitor for 48 hours to ensure it is stable  and if there is no reoccurrence we can consider this a transient issue 

HTH

Regards

Inayath

**Please dont forget to rate if this info is helpfull.

4 REPLIES

Have you got Smartnet?If so,

Have you got Smartnet?

If so, it may be best to go down this route and log something with TAC...

Cisco Employee

I dont see any attachments

I dont see any attachments.

Kindly attach the same again.

New Member

Hi, upload again. Thanks a

Hi, upload again. Thanks a lot.

Cisco Employee

Jackson,After reviewing the

Jackson,

After reviewing the show tech and crash info files, it looks like the RP crashed due to a parity error. A parity error will occur when a binary bit flips in values from a 0 to a 1 or vice versa and could be attributed to some kind of environment issue causing a fluctuation in electrical pulses such as background radiation (such as neutrons from cosmic rays), electromagnetic interference (EMI), or electrostatic discharge (ESD). If this is the first time occurrence, it is advisable to monitor the SUP for 48 hours to ensure that it doesn't reoccur as in most parity errors tend to be a transient/one-time occurrence and parity error stemming from faulty hardware will usually reoccur within this time frame. If the SUP is to experience another parity error, it is advisable to replace it as it could indicate that the hardware is defective

As you know the sup module would have RP and SP processor. RP which takes care of Routing process were in SP is responsible for switching part.

RP Crash info:
===========

 

Oct 17 14:24:00: %SYSTEM_CONTROLLER-3-ERROR: Error condition detected: TM_NPP_PARITY_ERROR
Oct 17 14:24:00: %SYSTEM_CONTROLLER-3-FATAL: An unrecoverable error has been detected. The system is being reset.

%Software-forced reload

 Early Notification of crash condition..

 14:24:00 TWN Fri Oct 17 2014: Breakpoint exception, CPU signal 23, PC = 0x428AE718


Explanation:

The most common errors from the Mistral ASIC on the Multilayer Switch Feature Card (MSFC) are TM_DATA_PARITY_ERROR, SYSDRAM_PARITY_ERROR,
SYSAD_PARITY_ERROR, and TM_NPP_PARITY_ERROR. The possible causes of these parity errors are random static discharge or other external factors.

Parity Errors are of two kinds:
.         Soft parity errors - these occur when an energy level within the chip (for example, a one or a zero) changes - When  referenced by the CPU, they cause the system to either crash or they recover. In case of a soft parity error, there is no need to swap the board or any of the components as they are generally Single Event Upsets (SEU).

.         Hard parity errors - these occur when there is a chip or board failure that causes data to be corrupted (not bad all or most of the time). In this case, you need to re-seat or replace the affected component, usually a memory chip swap or a board swap. We say that there is a hard parity error when we see multiple parity errors at the same address. There are more complicated cases which are harder to identify but, in general, if we see more than one parity error in a particular memory region in a relatively short period of time, this may be considered as a hard parity error.

Action plan:
=======
As this is the first occurrence this could be a transient issue. I suggest that we monitor for 48 hours to ensure it is stable  and if there is no reoccurrence we can consider this a transient issue 

HTH

Regards

Inayath

**Please dont forget to rate if this info is helpfull.

574
Views
0
Helpful
4
Replies
CreatePlease login to create content