Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.
Showing results for
Search instead for
Did you mean:
ACE module crashed abruptly with no specific reload reason
Cisco ACE modules sits inside Cisco Catalyst 6500 Series Switches and Cisco 7600 Series Routers to provide high level of load-balancing and application-delivery. ACE modules have robust software and hardware that makes it possible to handle high volume of traffic at real time.
ACE module crashed unexpectedly. No error message or problem log is found. There is no specific reload reason. The module was working fine since long time and then abruptly crashed and reloaded. The module is working fine since then and showing no problem.
Following log is seen on the backup module:
Aug 27 11:29:24 PDT: %FABRIC-SP-6-TIMEOUT_ERR: Fabric in slot 5 detected excessive flow-control on channel 8 (Module 9, fabric connection 0)
Aug 27 11:34:26 PDT: %OIR-SP-3-PWRCYCLE: Card in module 9, is being power-cycled off (Reset - Module Reloaded During Download)
Aug 27 11:34:27 PDT: %C6KPWR-SP-4-DISABLED: power to module in slot 9 set off (Reset - Module Reloaded During Download)
Aug 27 11:34:39 PDT: %OIR-SP-3-PWRCYCLE: Card in module 9, is being power-cycled off (Module not responding to Keep Alive polling)
Aug 27 11:34:39 PDT: %C6KPWR-SP-4-DISABLED: power to module in slot 9 set off (Module not responding to Keep Alive polling)
Aug 27 11:39:34 PDT: %DIAG-SP-6-RUN_MINIMUM: Module 9: Running Minimal Diagnostics...
Aug 27 11:39:34 PDT: %DIAG-SP-6-DIAG_OK: Module 9: Passed Online Diagnostics
Any kind of event that causes module to reload gets logged and is mentioned as reload reason. However certain situations can render even most basic working impossible and thus causing the module to reload without showing any reason. Almost always this is related to SRAM parity error. The SRAM parity error, which can be seen in the core file, is not due to a software issue; although there were software related issues in earlier code versions.
SRAM's are very sensitive to light, dust, radiation, shock, temperature, so it is possible to get an SRAM parity error on an healthy ACE. The issue is the result of a "bit-flip" within the SRAM itself which can occur as a result of environmental conditions. This "bit-flip" is rectified by a simple reboot of the system, which would occur with the generation of the core file. ACE is susceptible to this because of the way it uses SRAM to store control information and packet data as opposed to scratch-pad storage. Almost any 1-bit flip will be detected as a parity error. This is the problem with SRAM memory. All equipment makers face the same issue with this type of memory. SRAM memory is quite sensitive to a lot of things and it may detect parity error once in a while.
Recommended action is to upgrade to A2(3.3) or above in order to fix all software-related SRAM bugs. Refer bug CSCtc53046 for a partial software workaround which mitigates hardware generated SRAM parity errors by reducing the amount of access to the SRAM due to the collection of the interface statistics. SRAM errors are expected to occur at a frequency of approximately one per year per ACE module. A single SRAM parity error does not justify an RMA. If a particular module experiences a significantly higher failure rate and is running A2(3.3) or later, then a proactive RMA would be required.