cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4647
Views
5
Helpful
4
Replies

Cisco 7604 System controller errors

amg64
Level 1
Level 1

Hello

Can any one help me with the followign errors message:

I have a 7604 with the following IOS


Cisco IOS Software, c7600s3223_rp Software (c7600s3223_rp-ADVIPSERVICESK9-M), Version 12.2(33)SRB5, RELEASE SOFTWARE (fc2)

but for some days now I am receiving the following error:

Dec 13 09:51:37.737 CET: %SYSTEM_CONTROLLER-3-MISTRAL_RESET: System Controller is reset:Normal Operation continues
Dec 13 15:30:17.392 CET: %SYSTEM_CONTROLLER-3-ERROR: Error condition detected: TM_NPP_PARITY_ERROR
Dec 13 15:30:17.392 CET: %SYSTEM_CONTROLLER-3-MISTRAL_RESET: System Controller is reset:Normal Operation continues
Dec 13 21:17:24.225 CET: %SYSTEM_CONTROLLER-3-ERROR: Error condition detected: TM_NPP_PARITY_ERROR
Dec 13 21:17:24.225 CET: %SYSTEM_CONTROLLER-3-MISTRAL_RESET: System Controller is reset:Normal Operation continues
Dec 14 04:42:03.363 CET: %SYSTEM_CONTROLLER-3-ERROR: Error condition detected: TM_NPP_PARITY_ERROR
Dec 14 04:42:03.363 CET: %SYSTEM_CONTROLLER-3-MISTRAL_RESET: System Controller is reset:Normal Operation continues
Dec 14 06:27:10.935 CET: %SYSTEM_CONTROLLER-3-ERROR: Error condition detected: TM_NPP_PARITY_ERROR

Any ideas ?

Regards Suhale

4 Replies 4

The System controller error messages normally relates to 
the Mistral ASIC, which is the ASIC used to connect the MSFC with the
Pinnacle on the Supervisor and with the EOBC bus (so to communicate
with the Supervisor in general, both for control and data path).

TM_NPP_PARITY_ERROR means that there was a parity error in the next
page pointer of the internal Table  Manager. Recent IOS releases (like
12.1(19)E and above) detect this error and just reset the Mistral ASIC
and continues on. This is normally ok, since parity errors happen when
a bit flips, altering the binary word parity and such a state change is
sometimes of soft-nature.

Please physically reseat the supervisor and see if the same message
happens again. If it does, please move it to slot 2 and check if the
same message comes up again. This way, we can identify if there is any
problem with the chassis as well.

Additionnally, you may set the hw diag level to complete

> show diagnostic result module all
Current Online Diagnostic Level = Minimal

let me know how it goes.

Hello Armando,

I have faced same issue today, i know this post is old but it is useful, i dont know if you can reply me or not .

Would you please answer my question:

have 7606 router, got crashed today and i noticed these logs:

Mar 15 09:58:01 UTC: %SYSTEM_CONTROLLER-3-MISTRAL_RESET: System Controller is reset:Normal Operation continues


Mar 15 09:58:30 UTC: %CPU_MONITOR-2-NOT_RUNNING: CPU_MONITOR messages have not been sent for 30 seconds [*Sched* 1%/1% (00:00:31.228 92%/92%)] [IP SLAs Event Processor 00:13:04.660] [ACCT Periodic Proc 00:01:34.336] [IP SLAs Event Processor 00:06:17.848]


Mar 15 09:58:30 UTC: %CPU_MONITOR-2-NOT_RUNNING_TB: CPU_MONITOR traceback: 428284A8 427C6AB4 4193DA54 4193DC88 41F01354 42AD9CC4 42AE3128 42CC9070


Mar 15 09:59:00 UTC: %CPU_MONITOR-2-NOT_RUNNING: CPU_MONITOR messages have not been sent for 60 seconds [*Sched* 1%/1% (00:01:01.228 96%/96%)] [IP SLAs Event Processor 00:13:34.660] [ACCT Periodic Proc 00:01:34.336] [IP SLAs Event Processor 00:06:17.848]


Mar 15 09:59:00 UTC: %CPU_MONITOR-2-NOT_RUNNING_TB: CPU_MONITOR traceback: 428284A8 427C6AB4 4193DA54 4193DC88 41F01354 42AD9CC4 42AE3128 42CC9070


Mar 15 09:59:30 UTC: %CPU_MONITOR-2-NOT_RUNNING: CPU_MONITOR messages have not been sent for 90 seconds [*Sched* 1%/1% (00:01:31.228 97%/97%)] [IP SLAs Event Processor 00:14:04.660] [ACCT Periodic Proc 00:01:34.336] [IP SLAs Event Processor 00:06:17.848]


Mar 15 09:59:30 UTC: %CPU_MONITOR-2-NOT_RUNNING_TB: CPU_MONITOR traceback: 428284A8 427C6AB4 4193DA54 4193DC88 41F01354 42AD9CC4 42AE3128 42CC9070
Mar 15 10:00:00 UTC: %CPU_MONITOR-2-NOT_RUNNING: CPU_MONITOR messages have not been sent for 120 seconds [*Sched* 1%/1% (00:02:01.228 98%/98%)] [IP SLAs Event Processor 00:14:34.660] [ACCT Periodic Proc 00:01:34.336] [IP SLAs Event Processor 00:06:17.848]
Mar 15 10:00:00 UTC: %CPU_MONITOR-2-NOT_RUNNING_TB: CPU_MONITOR traceback: 428284A8 427C6AB4 4193DA54 4193DC88 41F01354 42AD9CC4 42AE3128 42CC9070


Mar 15 10:00:30 UTC: %CPU_MONITOR-2-NOT_RUNNING: CPU_MONITOR messages have not been sent for 150 seconds [*Sched* 1%/1% (00:02:31.228 98%/98%)] [IP SLAs Event Processor 00:15:04.660] [ACCT Periodic Proc 00:01:34.336] [IP SLAs Event Processor 00:06:17.848]


Mar 15 10:00:30 UTC: %CPU_MONITOR-2-NOT_RUNNING_TB: CPU_MONITOR traceback: 428284AC 427C6AB4 4193DA54 4193DC88 41F01354 42AD9CC4 42AE3128 42CC9070


Mar 15 10:00:33 UTC: %CPU_MONITOR-3-PEER_FAILED: CPU_MONITOR peer process has failed to receive heartbeats, reset by [5/0]

%Software-forced reload

10:00:33 UTC Fri Mar 15 2013: Breakpoint exception, CPU signal 23, PC = 0x4278D310

Would you please share your experience about this issue.

Regards,

Ahmed

Hi Ahmed,

About the error:

SYSTEM_CONTROLLER-3-MISTRAL_RESET: System Controller is reset:Normal
Operation continues
%SYSTEM_CONTROLLER-3-ERROR: Error condition detected: [chars] 
The most common errors from the Mistral ASIC on the MSFC are TM_DATA_PARITY_ERROR,
SYSDRAM_PARITY_ERROR, SYSAD_PARITY_ERROR, and TM_NPP_PARITY_ERROR. Possible causes of
these parity errors are random static discharge or other external factors.
Parity errors are devided into 2 types:

1) Single Event Upset (Soft Parity Error) :All computer and network systems are
susceptible to the rare occurrence of Single EventUpsets (SEU), sometimes described as
parity errors. These single bit errors occur when abit in a data word changes unexpectedly
due to external events (thus causing, for example,a zero to spontaneously change to a
one). SEUs are a universal phenomenon irrespective ofvendor and technology. SEUs occur
very infrequently, but all computer and network systems,even a PC, are subject to them.
SEUs are also called soft errors, which are caused bynoise and results in a transient,
inconsistent error in the data, and is unrelated to acomponent failure.

2) Repeated errors (Hard Parity error) :These are caused by failed components. A hard
error is caused by a failed component, or aboard-level problem such as improperly
manufactured printed circuit board that results inrepeated occurrences of the same error. 
We say that there is a hard parity error when we see  multiple parity errors at the same
address.There are more complicated cases which are harder to  identify but, in general, if
we seemore than one parity error in a particular memory region in a  relatively short
period oftime, this may be considered as a hard parity error.Action
Plan:----------------------------------------------Base on your information, I found the
message happened only once.And most of the pmpe cases are single event we experienced.So I
suggest you to monitor the device for some days.If the problem appears again, we will
treat it as hardware failure. 

Recommended Action: If the error message appears only once (or rarely), monitor the system

log to determine whether the error message was an isolated incident. If the message

recurs, check the environmental conditions for problems such as power brownouts, static

discharges, or strong EMI fields. If these environmental conditions are within normal

ranges and the error continues to appear, the supervisor engine may need to be replaced.

HTH

Regards

Inayath

*Plz rate the usefull posts.

Hello Dear,

Thanks for your input and i have monitor the router from that time till now , there is nothing is happened.

Regards,

Ahmed

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Innovations in Cisco Full Stack Observability - A new webinar from Cisco