cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2981
Views
0
Helpful
5
Replies

ACE-20 Crash: NP Core Reset - Cause Unknown

Hi all,

One of our ACE-20's crash recently with little info as to why - fortunately it was the FT standby module so service wasn't impacted but obviously keen to determine the cause of the crash, and potential resolution.

Running A2 (3.5).

last boot reason:  NP 1 Failed : NP Core Reset - Cause Unknown

There is nothing obvious from the switch perspective:

Apr 17 14:52:35.775 bst: SP: The PC in slot 9 is shutting down. Please wait ...

Apr 17 14:52:45.780 bst: SP: PC shutdown completed for module 9

510497: Apr 17 14:52:55.781 bst: %C6KPWR-SP-4-DISABLED: power to module in slot 9 set off (Reset)

510498: Apr 17 14:57:58.277 bst: %DIAG-SP-6-RUN_MINIMUM: Module 9: Running Minimal Diagnostics...

510499: Apr 17 14:57:58.537 bst: %DIAG-SP-6-DIAG_OK: Module 9: Passed Online Diagnostics

510500: Apr 17 14:57:59.213 bst: %OIR-SP-6-INSCARD: Card inserted in slot 9, interfaces are now online

510501: Apr 17 14:58:06.974 bst: %SVCLC-5-FWTRUNK: Firewalled VLANs configured on trunks

Has anyone come across this issue before ?  Any particular way to further diagnose the fault ?

Any help is appreciated.

Thanks,

Anthony

5 Replies 5

Found an ixp1_crash.txt in core: filesystem.

Most of it doesn't mena much to me, but I did find reference to :

Shutdown[0,0] S/C/F=40/4/0 C/D=fe005fec/fe051ed8

[0]PID-TID=172052-11  P/T FL=00000010/85020000 "proc/boot/loadBalance_g_ns"

armbe context[fee16abc]:

0000: 00000000 3b300000 0005a3b2 0005a3b3 83900000 0017a9e0 00000000 83324200

0020: 00183900 4f8d758f 0017ae60 0179cfbc 849eb190 0179cf40 000003b3 00104c68

0040: 2000001f

instruction[00104c68]:

e5 0b c0 2c e1 a0 e3 20 e2 0e 20 01 e3 52 00 01 e5 0b 20 40 0a 00 00 14 e5 1b

stack[0179cf40]:

0000: 00000000 00000000 849eb190 00000000 00000000 00000000 00000000 00000002

0020: 00000010 b0c00000 00000001 00000004 00010000 00000001 00040000 00000000

0040: 0005a3b3 00200000 b0c05040 82e000a0 849eb190 00000000 00000000 00000000

0060: 00000000 00000000 00000000 00000000 00000000 0179cfc0 0012d2a8 00104b58

System image version: A2(3.5) 3.0(0)A2(3.5) adbuild_16:16:16-2011/08/04_/auto/adbure_nightly4/renumber/rel_a2_3_5_throttle/REL_3_0_0_A2_3_5

IXP CAUSE = NP Core Reset - Cause Unknown

And

<3>% IXP 1 XScale Core reset detected !!

Will probably raise a ticket, and see what comes from that.

Hi

did you open a TAC case on this? If yes, what result you got?

thanks:

jonagy

Hi Anthony,

seems like a known defect in A2(3.5)

Please read the release notes:

http://www.cisco.com/en/US/docs/interfaces_modules/services_modules/ace/vA2_3_x/Release/Note/RACEA2_3_X.html

CSCtr77869

—When the  configuration manager sends a message to TCP and the message has a proxy  ID that is out of bounds, the network processor microengine (ME)  becomes unresponsive and the ACE reloads with a last boot reason of "NP 1  Failed : NP ME Hung" or "NP 2 Failed : NP ME Hung". Workaround: None.

The defect is fixed in A2(3.6a)

regards,

Ajay Kumar

Hi

many thanks for your advise. I have found this bug, but have doubts that we hit exactly, because

I see:

last boot reason:  NP 2 Failed : NP Core Reset - Cause Unknown 

and not:

last boot reason:  NP 2  Failed : NP ME Hung

there was no core file, instead we found two new files on "core:" created exactly at the time of the reload

file "ixp2_crash.txt"

IXP CAUSE = NP Core Reset - Cause Unknown

NO Parity Error DETECTED   #regarding SRAM

************************************************************

Kernel Message Ring Buffer Start:

************************************************************

...

<2>Warning:- MTS queue is full for opcode 4062 sap 25137 pid 2455. This warning can be ignored. If you want to recover - close all debug plugin sessions and terminate command execution in all telnet/ssh connections.
<3>% IXP 2 XScale Core reset detected !!
<4>sending signal 17 to SME, pid 954

file "outstanding_syslogs"

which contains:

Sun Apr 28 09:30:17 2013
snmpget reqID : 16167790 ctxId 0 -v 2c -c zeus124 -m all xx.xx.yy.zz iso.3.6.1.2.1.1.3.0 
Sun Apr 28 09:31:41 2013
snmpget msgID : 1229056 ctxId 0 -v 3 -u NNMI_USER -l authPriv -m all xx.xx.yy.zz iso.3.6.1.4.1.9.9.109.1.1.1.1.8.1 iso.3.6.1.2.1.1.3.0 iso.3.6.1.4.1.9.9.117.1.2.1.1.2.1 iso.3.6.1.2.1.2.2.1.7.16777224 iso.3.6.1.2.1.2.2.1.8.16777224 iso.3.6.1.2.1.2.2.1.7.16777226 iso.3.6.1.2.1.2.2.1.8.16777226  

there was no configuration change during the crash.

BTW, this system runs :

build 3.0(0)A2(3.5)

Moreover on the bug description page, this sw ver. not listed.

Anyway upgrade is advisable.

BR:

jonagy

<2>Warning:- MTS queue is full for opcode 4062 sap 25137 pid 2455. This warning can be ignored. If you want to recover - close all debug plugin sessions and terminate command execution in all telnet/ssh connections.
<3>% IXP 2 XScale Core reset detected !!
<4>sending signal 17 to SME, pid 954

Jorge Bejarano
Level 4
Level 4

Hello Anthony,

You can take a look of the #dir core:, as you can see in this link:

http://docwiki.cisco.com/wiki/Cisco_Application_Control_Engine_%28ACE%29_Troubleshooting_Guide_--_Overview_of_ACE_Troubleshooting#Copying_Core_Dumps

You may require to get the core dumps out of box and open a TAC case to determine which software defect impacted.

Usually, we may require all the core dumps when the issue happens, the show tech-support of the switch and ace module and syslog messages.

Hope this helps!!!

Jorge