cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
6353
Views
14
Helpful
1
Comments
Anand G
Cisco Employee
Cisco Employee

Hello,

You might have come across a situation wherein your switch has received the below error messages. These error messages basically appear on the SUP720 modules on 6500/7600 switches.

<snip>

Dec 14 17:08:56.408:  CONST_DIAG-SP-4-ERROR_COUNTER_WARNING  Module 8 Error counter exceeds threshold, system operation continue.

Dec 14 17:08:56.408:  CONST_DIAG-SP-4-ERROR_COUNTER_DATA  ID:42 IN:0 PO:255 RE:1252 RM:255 DV:2 EG:2 CF:10 TF:488

Dec 15 07:08:35.387: %CONST_DIAG-SP-4-ERROR_COUNTER_WARNING: Module 8 Error counter exceeds threshold, system operation continue.

Dec 15 07:08:35.391: %CONST_DIAG-SP-4-ERROR_COUNTER_DATA: ID:42 IN:0 PO:255 RE:1252 RM:255 DV:12 EG:2 CF:10 TF:593

Dec 22 00:17:08.902: %CONST_DIAG-SP-4-ERROR_COUNTER_WARNING: Module 8 Error counter exceeds threshold, system operation continue.

Dec 22 00:17:08.902: %CONST_DIAG-SP-4-ERROR_COUNTER_DATA: ID:42 IN:0 PO:255 RE:1252 RM:255 DV:30 EG:2 CF:10 TF:1080

What is TestErrorCounterMonitor?

The TestErrorCounterMonitor has detected that an error counter in the specified module has exceeded a threshold. Specific data about the error counter will be sent in a separate system message. The TestErrorCounterMonitor is a non-disruptive health-monitoring background process that periodically polls the error counters and interrupt counters of each line card or supervisor module in the system. This message contains specific data about the error counter, including the ASIC and register of the counter, and the error count.

These error message may be generated either by Supervisor or on behalf of any of the classic line cards present in the chassis.

How to proceed from here?

In the 'show module' output, see whether any classic line cards are available. If so, check for any CRC errors from any of the interfaces among these classis line cards. If any of the classic card is found to be faulty, it has to be replaced. If there is no CRC errors found on the interfaces pertaining to any of the classic line cards, then the problem would be with the supervisor itself.

Upon decoding these error messages, you may get the bad packet CRC related result.

Example:

"HY_FD_PG_PG_BAD_PKT_CRC The number of packets with bad_pkt_crc found."

Sometimes, a simple reseat of supervisor may fix the issue if no classic line cards are present.


Why are we suspecting the fault @ Classis line cards instead of Supervisor?

Hyperion on the Supervisor is most likely just detecting the errors via counter HY_FD_PG_PG_BAD_PKT_CRC The number of packets with bad_pkt_crc found. The register start with FD = old Medusa ASIC, so this is related to the old bus, this is why we need to look at the classic line cards.

Few Cisco Defects:

There are possibilities for these errors can be false positive on certain 67xx cards. Please verify whether we hit any of these bugs.

CSCua09073  TestErrorCounterMonitor is not available in 7600 boxes for 6708 LC.

CSCtq73026  6708 port ASIC generates txCRC

CSCtl77057  TestErrorCounterMonitor can generate false positive on 67XX cards

As a last resort, after verifying there is no issues with classic line cards or with supervisor, chassis replacement is the final solution.

Regards,

Anand G

1 Comment
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: