Cisco Support Community
cancel
Showing results for 
Search instead for 
Did you mean: 
cancel

Sup720 crashes due to a watchdog timer interrupt on a Cisco Catalyst 6500 switch that runs CatOS

Core issue

A watchdog timer is a hardware countdown timer that the software periodically resets. If the software gets stuck for two to six seconds, the timer interrupts the CPU, and Catalyst OS (CatOS) crashes. This problem usually occurs due to a code bug, and is called a watchdog crash.

A watchdog crash can occur on a Supervisor 720 (Sup720) due to parity errors in the Layer 2 (L2) MAC forwarding table memory. Error Correction Code (ECC) is designed to automatically correct parity errors as much as possible. However, when a software defect exists and these parity errors are not handled correctly, watchdog crashes occur.

Transient parity errors are the most common type of parity errors. Transient parity errors are not the result of bad memory hardware. Transient parity errors are cleared from the memory after a crash. Another crash occurs only if new parity errors are encountered.

For more information, refer to Cisco bug ID CSCed55259.

Resolution

The fix for this bug prevents the watchdog crash and allows ECC to automatically correct the parity errors.

In the less likely event that the L2 MAC forwarding table memory is truly faulty due to bad hardware, a threshold is also set with this fix. As a result, recurring parity errors still result in a crash, and lead to easier identification and replacement of bad hardware.

578
Views
0
Helpful
0
Comments