cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
643
Views
5
Helpful
1
Replies

ACE Unexpected Reload//ACE20-MOD-K9

Juan Ibañez
Level 1
Level 1

I have a ACE20-MOD-K9 with version Version A2(3.5), and this Unexpected Reload:

I have this code: last boot reason:  NP 2 Failed : SRAM Parity Error Chan 3 

I need a Workaround for this problem please.

Thanks

1 Accepted Solution

Accepted Solutions

Kanwaljeet Singh
Cisco Employee
Cisco Employee

Hi Juan,

Looking at the last reload reason, the ACE seems to have reloaded due to SRAM parity error. If the ACE reloads with the same issue again within an year, ACE should be RMA'd. You should find some crash/core files in dir core:. If you send me those i can verify if this is indeed the crash that occurred. But i am 99.99% sure it is the SRAM PARITY CRASH. Here's the brief about it:

The SRAM parity error presented in the core file is not due to a software issue.
The issue is the result of a "bit-flip" within the SRAM itself which can occur as a
result of environmental conditions.  This "bit-flip" is rectified by a simple reboot of
the system, which would occur with the generation of the core file. Cisco internal
testing and customer experience has shown that these types of issues can occur
with very low frequency, but do not require an RMA of the device.

ACE is susceptible to this because of the way it uses SRAM to store control information
and packet data as opposed to scratch-pad storage. Almost any 1-bit flip will be detected
as a parity error.

CSCtc53046 is a partial software workaround which mitigates hardware generated SRAM
parity errors by reducing the amount of access to the SRAM due to the collection of the interface
statistics. It is highly recommended that customers upgrade to A2(3.3) or later to both
lower the overall rate of SRAM parity errors and ensure failover occurs appropriately.

SRAM errors are expected to occur at a frequency of approximately one per year per ACE module.
If a particular module experiences a significantly higher failure rate and is running A2(3.3)
or later, then a proactive RMA would be in order.

Regards,

Kanwal

Note: Please mark answers if  they are helpful.

View solution in original post

1 Reply 1

Kanwaljeet Singh
Cisco Employee
Cisco Employee

Hi Juan,

Looking at the last reload reason, the ACE seems to have reloaded due to SRAM parity error. If the ACE reloads with the same issue again within an year, ACE should be RMA'd. You should find some crash/core files in dir core:. If you send me those i can verify if this is indeed the crash that occurred. But i am 99.99% sure it is the SRAM PARITY CRASH. Here's the brief about it:

The SRAM parity error presented in the core file is not due to a software issue.
The issue is the result of a "bit-flip" within the SRAM itself which can occur as a
result of environmental conditions.  This "bit-flip" is rectified by a simple reboot of
the system, which would occur with the generation of the core file. Cisco internal
testing and customer experience has shown that these types of issues can occur
with very low frequency, but do not require an RMA of the device.

ACE is susceptible to this because of the way it uses SRAM to store control information
and packet data as opposed to scratch-pad storage. Almost any 1-bit flip will be detected
as a parity error.

CSCtc53046 is a partial software workaround which mitigates hardware generated SRAM
parity errors by reducing the amount of access to the SRAM due to the collection of the interface
statistics. It is highly recommended that customers upgrade to A2(3.3) or later to both
lower the overall rate of SRAM parity errors and ensure failover occurs appropriately.

SRAM errors are expected to occur at a frequency of approximately one per year per ACE module.
If a particular module experiences a significantly higher failure rate and is running A2(3.3)
or later, then a proactive RMA would be in order.

Regards,

Kanwal

Note: Please mark answers if  they are helpful.