×

Warning message

  • Cisco Support Forums is in Read Only mode while the site is being migrated.
  • Cisco Support Forums is in Read Only mode while the site is being migrated.

Parity Error.

Blog

Tue, 07/23/2013 - 18:08
Mar 29th, 2013
User Badges:
  • Cisco Employee,

Hi,

You may often find the following messages on the modules :-

%PM_SCP-SP-2-LCP_FW_ERR_INFORM: Module [dec] is experiencing the following error: Port ASIC ([name]) packet buffer failure detected on ports [dec]

Explanation: This is the result of a parity error occurring in the Port ASIC packet buffer (SRAM) used by the modules.

Recommendation: Monitor the system for reoccurrence (e.g. one day or week), and if no further events are observed it is a soft error. If it occurs frequently, then the  module should be replaced (RMA).


%LTL-SP-2-LTL_PARITY_CHECK: LTL parity check request for 0x[hex]

Explanation: This is the result of a parity error occurring in the Port ASIC port index table (SRAM) used by the 6100-6500 and 6700 series modules.

Recommendation: Monitor the system for reoccurrence (e.g. one day or week), and if no further events are observed it is a soft error. If it occurs frequently, then the module should be replaced (RMA.


There are two kinds of parity errors:

Soft Errors

Most parity errors are caused by electro-static or magnetic related environmental conditions.

Research has shown that the majority of single event (or "soft") errors in memory chips occur as a result of background radiation (e.g. neutrons from cosmic rays), electro-magnetic interference (EMI),or electro-static discharge (ESD), which may randomly change the electrical state of one or more memory cells, or interfere with the circuitry used to read&write them.

These “soft” parity error events are typically transientor randomin nature, and usually only occur a single time. Depending on the severity of the data corruption, there are two types of soft errors. Minor soft errors that can be corrected without component reset are referred to as “Single Event Upsets” or SEU’s. Severe soft errors that require a component or system to be reset are referred to as “Single Event Latch-ups” or SEL’s.

The important thing to remember is that soft errors are not caused by hardware malfunction. Soft errors (SEU’s) are the result of an environmental disruption of the memory data, and only appear infrequently.


Remember that “Soft” parity errors are transient and infrequent, mostly likely a single event upset (SEU), caused by an environmental disruption.

So (in addition to the information above), you need to analyze any recent environmental changes that have occurred in the location where the affected system is installed.


Hard Errors

Other parity errors are caused by a physical malfunction of the memory hardwareand / or the circuitry used to read & write them.

Hardware manufacturers take extensive measures to prevent and test for hardware defects. However, some small percentage is still statistically possible. For example, if any of the memory cells used to store data bits are malformed, then these may be unable to hold a charge or to be more vulnerable to environmental conditions.

Similarly, while the memory itself may be operating normally, any physical or electrical damage to the circuitry used to read & write them may also cause data bits to be changed from their stored state (during transfer), resulting in a parity error.

These “hard” parity errors are typically very frequent and repeated, and will occur whenever that piece of memory or circuitry is used. The exact frequency depends on the extent of the malfunction and how frequently the damaged equipment is used.


Work around:


Repeated errors (often referred to a hard errors) are caused by failed components. A hard error is caused by a failed component or a board-level problem, such as an improperly manufactured printed circuit board that results in repeated occurrences of the same error.


If you see the error message only once or rarely, monitor the switch syslog in order to confirm that the error message is an isolated incident. If these error messages reoccur, reseat the module and check the performance.



Recommendation:


Monitor the device for next 48 hours. If it occurs frequently, then the module should be replaced (RMA).


Please feel to update if you required any further information on the same.


HTH

Regards

Inayath

Loading.
vciric Tue, 07/23/2013 - 07:44
User Badges:

Is it possible to figure out what module has the issue if in log we only have the following messages:


187073: Jul 1 14:59:59.215 CEST: %LTL-SW1_SP-2-LTL_PARITY_CHECK: LTL parity check request for 0x47D.

187074: Jul 1 15:00:30.776 CEST: %LTL-SW1_SP-2-LTL_PARITY_CHECK: LTL parity check request for 0x47D.

187075: Jul 1 15:01:01.331 CEST: %LTL-SW1_SP-2-LTL_PARITY_CHECK: LTL parity check request for 0x47D.

187076: Jul 1 15:01:31.860 CEST: %LTL-SW1_SP-2-LTL_PARITY_CHECK: LTL parity check request for 0x47D.

InayathUlla Sharieff Tue, 07/23/2013 - 18:08
User Badges:
  • Cisco Employee,

Hi,

Yes  there is a way to check the same.

Please run the following command on the device:


sh diagnostic result module all  detail | i ^Switch|Ltl


If the box is VSS then run this:

sh diagnostic result switch 1 module all detail | i ^Switch|Ltl
sh diagnostic result switch 2 module all detail | i ^Switch|Ltl


Example:
The output of the command would be something like this:
Switch 1 Module 8: CEF720 48 port 10/100/1000mb Ethernet  SerialNo :
AZL1234565
   15) TestLtlFpoeMemoryConsistency ----> .
          Ltl index -------------------> 0
Switch 1 Module 9: CEF720 8 port 10GE with DFC  SerialNo : SSABLADIE
   35) TestLtlFpoeMemoryConsistency ----> .
          Ltl index -------------------> 58984
Here is the conclusion that you can see tha switch1 module 9 LTL index is non zero value and that is causing the module to reset.


Starting in SXI, we have put in a LTL/FPOE consistency checker. Periodically
the hardware LTL/FPOE states are compared to what software thinks they
should be.  If an inconsistency is found, it is corrected.


These messages indicate that an error was found, and it was corrected.
Normally this should be a transient issue if it has happened once and there
is nothing to worry about but if is happening on regular basis then it could
be a hardware problem


HTH

Regards

Inayath.

*Plz rate if this info is helpfull.

Actions

This Blog