- Cisco Employee,
You may often find the following messages on the modules :-
%PM_SCP-SP-2-LCP_FW_ERR_INFORM: Module [dec] is experiencing the following error: Port ASIC ([name]) packet buffer failure detected on ports [dec]
Explanation: This is the result of a parity error occurring in the Port ASIC packet buffer (SRAM) used by the modules.
Recommendation: Monitor the system for reoccurrence (e.g. one day or week), and if no further events are observed it is a soft error. If it occurs frequently, then the module should be replaced (RMA).
%LTL-SP-2-LTL_PARITY_CHECK: LTL parity check request for 0x[hex]
Explanation: This is the result of a parity error occurring in the Port ASIC port index table (SRAM) used by the 6100-6500 and 6700 series modules.
Recommendation: Monitor the system for reoccurrence (e.g. one day or week), and if no further events are observed it is a soft error. If it occurs frequently, then the module should be replaced (RMA.
There are two kinds of parity errors:
Most parity errors are caused by electro-static or magnetic related environmental conditions.
Research has shown that the majority of single event (or "soft") errors in memory chips occur as a result of background radiation (e.g. neutrons from cosmic rays), electro-magnetic interference (EMI),or electro-static discharge (ESD), which may randomly change the electrical state of one or more memory cells, or interfere with the circuitry used to read&write them.
These “soft” parity error events are typically transientor randomin nature, and usually only occur a single time. Depending on the severity of the data corruption, there are two types of soft errors. Minor soft errors that can be corrected without component reset are referred to as “Single Event Upsets” or SEU’s. Severe soft errors that require a component or system to be reset are referred to as “Single Event Latch-ups” or SEL’s.
The important thing to remember is that soft errors are not caused by hardware malfunction. Soft errors (SEU’s) are the result of an environmental disruption of the memory data, and only appear infrequently.
Remember that “Soft” parity errors are transient and infrequent, mostly likely a single event upset (SEU), caused by an environmental disruption.
So (in addition to the information above), you need to analyze any recent environmental changes that have occurred in the location where the affected system is installed.
Other parity errors are caused by a physical malfunction of the memory hardwareand / or the circuitry used to read & write them.
Hardware manufacturers take extensive measures to prevent and test for hardware defects. However, some small percentage is still statistically possible. For example, if any of the memory cells used to store data bits are malformed, then these may be unable to hold a charge or to be more vulnerable to environmental conditions.
Similarly, while the memory itself may be operating normally, any physical or electrical damage to the circuitry used to read & write them may also cause data bits to be changed from their stored state (during transfer), resulting in a parity error.
These “hard” parity errors are typically very frequent and repeated, and will occur whenever that piece of memory or circuitry is used. The exact frequency depends on the extent of the malfunction and how frequently the damaged equipment is used.
Repeated errors (often referred to a hard errors) are caused by failed components. A hard error is caused by a failed component or a board-level problem, such as an improperly manufactured printed circuit board that results in repeated occurrences of the same error.
If you see the error message only once or rarely, monitor the switch syslog in order to confirm that the error message is an isolated incident. If these error messages reoccur, reseat the module and check the performance.
Monitor the device for next 48 hours. If it occurs frequently, then the module should be replaced (RMA).
Please feel to update if you required any further information on the same.