Re: %SYS-4-P2_WARN:

mitch.erickson · ‎08-08-2002

Hi:

I have a Catalyst 4006, and am getting the following message.

I understand that it is only a warning message, however, would like to try and determine if I can find who is generating these errors.

%SYS-4-P2_WARN: 1/Invalid crc, dropped packet, count = 19399

I can not see this message in the documentation.

Is there anyway to tell from this message where the errors are sourcing from ?

thanks in advance.

donewald · ‎08-08-2002

Try a "show port counters" and see if you can identify which port is errored. This should be able to help you narrow your search.

Hope this helps,

Don

rfroom · ‎08-08-2002

Actually, the CRC is an internal CRC not caused by a front panel port. Internal CRCs are always caused by a faulty component on the supervisor. I would suggest having the supervisor replaced especially since the problem has happened 19,399 times.

mahe · ‎08-08-2002

Hi,

please read the appendix. I hope it helps.

NVRAM Log and System Error Messages in SysLog

When determining the "health" of the Catalyst 4000 Series Product, the show

env 1 for CatOS 4.x and 5.x and show nvramenv 1 for CatOS 6.x command is

useful. The first example below shows a switch that is not experiencing any

anomalous behavior according to the NVRAM Log.

Switch (enable) sh env 1

ps1="rommon ! > "

?="0"

DiagBootMode="post"

MemorySize="64"

ResetCause="20"

AutobootStatus="success"

InvalidPacketBufferCRCs - %SYS-4-P2_WARN: 1/Invalid crc, dropped packet,

count = X

The following show nvramenv 1 below shows 73 Invalid Packet Buffer CRCs. The

InvalidPacketBufferCRC refers to the number of frames copied to and from a

K1 SRAM that have a CRC. All packets coming into the K1 are checked for a

valid Layer 2 CRC.Packets with an invalid CRC are dropped by the K1. Good

packets are then stored in packet buffer memory, K1's sram. When the CPU

gets a packet, it just knows the memory address where the packet is stored

and software reads it from there. In software, the software will check the

CRC after reading a packet from memory and if the CRC is bad, the software

will log an "Invalid CRC message" and drop the packet. A

InvalidPacketBufferCRC count is incremented in NVRAM and Syslog Message is

logged as well. The log message is rate limited and you will see only one

message in a short interval of 5-10mins (time interval could vary depending

on software release). It is normal in any enviroment to get a few

InvalidPacketBufferCRCs every few days for the same reason there are PMPEs.

However, the InvalidPacketBufferCRCs is incrementing steadily, then there

may be faulty SRAM on the Supervisor. The Power-On Self Test (POST) will

check for possible fauty SRAM failures; the POST Test is not 100% conclusive

for intermittant SRAMs failures. Another possible issue for serveral

InvalidPacketBufferCRCs is CSCdt80707 and CSCdu48749.

Note: The count in NVRAM is the total number of InvalidPacketBufferCRCs

packets that were received by the CPU since bootup. Since a few

InvalidPacketBufferCRCs are normal, best judgment is needed for determining

if a hardware replacement is needed after rebooting the system and noting

the POST test results.

Switch_6 (enable) sh nvramenv 1

PS1="rommon ! > "

?="0"

DiagBootMode="post"

InvalidPacketBufferCrcs="73"

MemorySize="64"

ResetCause="20"

AutobootStatus="success"

BlockedGigaportCount and BlockedTXQCount - %SYS-4-P2_WARN: 1/Blocked queue

on gigaport 5, (BlockedTXQCount : BlockedGigaportCount)

The following show nvramenv 1 shows an exemptional high number of Block

Gigaport Count and BlockedTXQCount. These counters indicate a hardware

problem or one of the following:

duplex mismatch, faulty cable, Type I cabling, faulty ports, or hardware

problem of external connected device. Details on troubleshooting these

scenerios is below.

Note: The most common situation that causes these errors is a physical layer

problem causing a considerable amount of traffic to back up on the internal

K1 gigaports. Generally, the blockedTXQcount increments due to a

configuration issue or faulty cabling.

Generally, a front panel port issue such as collisons or a speed aggregation

problems will cause the txQueueNotAvail to increment on the show counters

/ command. This incrementing of the txQueueNotAvail counter

will occur when the front panel issues cause the port TX queue to fill up.

In normal environment, the TX queue can only be blocked about 20 seconds;

any time longer indicates a significant problem. As a result, the

blockedTXQcount will increment if the TX queue has not drained for the

gigaport in 35 seconds. Conditions that can cause the blockedTXQcount to

increment are front panel port misconfigurations, duplex mismatch, type I

cabling, faulty ports, faulty cabling, bad line card or hardware problem of

external connected device. These two situations must be ruled out before

considering hardware replacement

Occasionally when there is a blocked TX queue situation, the RxMac of a

gigaport may also stops receiving packets. On gigaports connected to stub

ports, the system always expect some ESMP traffic. If the RX counters of a

gigaport, connected to a stub, do not increment for 30s, the system will

increment the BlockedGigaportCount in NVRAM and reset the gigaport in an

attempt to clear the stuck state. Blocked TxQueues are detected on Gigabit

Uplinks on a well. The same situations for blockedTXQcount may cause

BlockedGigaportCount. It is important to rule out front panel port

misconfigurations, duplex mismatch, type I cabling, faulty cabling, or

hardware problem of external connected device before consider hardware

replacement.

When troubleshooting the BlockGigaport countes, the first step is to issue

the command show port counters on the associated front panel ports for the

gigaport (see Appendix A). If there are no front panel ports experiencing

any errors, the supervisor may be experiencing a hardware problem with the

K1 gigaports.

Switch_6 (enable) sh nvramenv 1

PS1="rommon ! >"

?="0"

DiagBootMode="post"

BlockedGigaportCount="79838"

BlockedTxQCount="5015"

MemorySize="64"

ResetCause="20"

AutobootStatus="success"

The SysLog message displays the respective gigaport along with that

individual gigaport's blockedTXQCount and BlockedGigaportCount

For example: %SYS-4-P2_WARN: 1/Blocked queue on gigaport 1, ( 4 : 5 )

4 is the blockedTxQCount since bootup.

5 is the blockedGigaportCount since bootup.

InternalPortReconfigs - %SYS-4-P2_WARN: 1/Reconfiguring internal gigaport

subport count=x

The Internal Port Reconfigs refers to the work-around associated with

CSCdt80707. Upon receiving this message, a the show logging buffer 1023

needs to be checked for the related System Logging Message in order to

determine which internal gigaport was reconfigured and the associated count.

Switch_6 (enable) sh nvramenv 1

PS1="rommon ! >"

?="0"

DiagBootMode="post"

InternalPortReconfigs="1"

MemorySize="64"

ResetCause="20"

AutobootStatus="success"

Example SysLog Message:

%SYS-4-P2_WARN: 1/Reconfiguring internal gigaport 23 subport 184 count=2

The message above indicates that the switch software detected that the

internal Gigaport on K1-C that connects K1-B (gigaport 23) lost its VLAN

configuration and corrected the configuration by reconfiguring the internal

port.

%SYS-4-P2_WARN: 1/Astro(module_number/Astro_reference) - timeoutoccurred

This error message is not logged to NVRAM.This error message is useful in

finding faulty SERDES, oscillator, or Astro's.The 3 /4 stands for module 3,

Astro 4.Issuing the command dump 1 will provide additional

information about the Astro.If the Slot Number and Astro Number is

consistent with the error messages, the problem is either a line card

'SERDES or Astro Failure' or Supervisor SERDES failure.The best step in

troubleshooting is to move the line card to another slot.If the error

messages follow the line card, the problem is a line card failure.Otherwise,

the problem is most likely a SERDES failure on the Supervisor.If multiple

Astros are having the problem, the root cause of the problem could be a bad

oscillator.Other circumstances that could cause this behavior is a network

broadcast storm or layer 2 loop.

Example: %SYS-4-P2_WARN: 1/Astro(3/4) - timeoutoccurred

mitch.erickson · ‎08-08-2002

thank you

this is perfect !!