Cisco Support Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Announcements

Welcome to Cisco Support Community. We would love to have your feedback.

For an introduction to the new site, click here. And see here for current known issues.

New Member

ASR9006 upgrade IOS XR fail

Hi

After upgrade ASR9006 from 4.2.0 to 4.3.2 the line card do not boot up properly.

I tryed to upgrade FPD on line card, but had not success. What i can do now?

RP/0/RSP0/CPU0:LAB-9k-440#sh plat
Tue Sep 24 17:53:20.083 UTC
Node            Type                      State            Config State
-----------------------------------------------------------------------------
0/RSP0/CPU0     A9K-RSP440-TR(Active)     IOS XR RUN       PWR,NSHUT,MON
0/0/CPU0        A9K-40GE-L                BRINGDOWN        PWR,NSHUT,MON
0/2/CPU0        A9K-MOD80-TR              IN-RESET         PWR,NSHUT,MON

RP/0/RSP0/CPU0:LAB-9k-440(admin)#upgrade hw-module fpd rommon location 0/0/CPU0

Tue Sep 24 17:57:56.274 UTC

***** UPGRADE WARNING MESSAGE: *****

  *  This upgrade operation has a maximum timout of 160 minutes.  *

  *  If you are executing the cmd for one specific location and  *

  *  card in that location reloads or goes down for some reason  *

  *  you can press CTRL-C to get back the RP's prompt.           *

  *  If you are executing the cmd for _all_ locations and a node *

  *  reloads or is down please allow other nodes to finish the   *

  *  upgrade process before pressing CTRL-C.                     *

% RELOAD REMINDER:

  - The upgrade operation of the target module will not interrupt its normal

    operation. However, for the changes to take effect, the target module

    will need to be manually reloaded after the upgrade operation. This can

    be accomplished with the use of "hw-module <target> reload" command.

  - If automatic reload operation is desired after the upgrade, please use

    the "reload" option at the end of the upgrade command.

  - The output of "show hw-module fpd location" command will not display

    correct version information after the upgrade if the target module is

    not reloaded.

NOTE: Chassis CLI will not be accessible while upgrade is in progress.

Continue? [confirm]

Everyone's tags (4)
21 REPLIES
Cisco Employee

ASR9006 upgrade IOS XR fail

Hi,

FPDs cannot be updated when a card is booting.

BRINGDOWN just means the card is reloading.

IN-RESET means the card has failed to boot too many times so the system disables it. You can get out of this state via manual intervention such as the hw-mod reload command.

What is the highest node state the card reach before resetting? Do the cards hit present, rommon, mbi-boot, mbi-run, or xr-run?

This will help to determine what other commands to look at and why the cards do not boot up all the way.

Can you also send the output of 'show log' snipped for card related messages? Something like 'show log | i 0/2/CPU0'

Thanks,

Sam

New Member

Re: ASR9006 upgrade IOS XR fail

Hi Sam,

Thank you for your attention,
node 0/2/CPU0 passed through the following  state:
mbi-boot => mbi-run => ios XR PREP => rommon => MBI-boot => in-reset
node 0/0/cpu0 passed through the following state:

0/0/CPU0

PRESENT

ROMMON

BRINGDOWN

log

RP/0/RSP0/CPU0:Sep 26 15:22:08.813 : config[65744]: %MGBL-SYS-5-CONFIG_I : Configured from console by admin

RP/0/RSP0/CPU0:Sep 26 15:23:10.636 : shelfmgr[389]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/0/CPU0 A9K-40GE-L state:PRESENT

RP/0/RSP0/CPU0:Sep 26 15:23:10.692 : config[65861]: %MGBL-CONFIG-6-DB_COMMIT_ADMIN : Configuration committed by user 'admin'. Use 'show configuration commit changes 2000000016' to view the changes.

RP/0/RSP0/CPU0:Sep 26 15:23:12.855 : config[65861]: %MGBL-SYS-5-CONFIG_I : Configured from console by admin

RP/0/RSP0/CPU0:Sep 26 15:25:40.912 : shelfmgr[389]: %PLATFORM-SHELFMGR-3-FSMTIMEOUT_RESET : Node 0/0/CPU0 is reset due to failed bootup. Node state was: 1 Timeout ID: 10

RP/0/RSP0/CPU0:Sep 26 15:25:40.935 : canb-server[150]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/0/CPU0 , Power Cycle (0x05000000) 

RP/0/RSP0/CPU0:Sep 26 15:25:40.935 : shelfmgr[389]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/0/CPU0 A9K-40GE-L state:ROMMON

RP/0/RSP0/CPU0:Sep 26 15:28:11.214 : shelfmgr[389]: %PLATFORM-SHELFMGR-3-FSMTIMEOUT_RESET : Node 0/0/CPU0 is reset due to failed bootup. Node state was: 3 Timeout ID: 10

RP/0/RSP0/CPU0:Sep 26 15:28:11.236 : canb-server[150]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/0/CPU0 , Power Cycle (0x05000000) 

RP/0/RSP0/CPU0:Sep 26 15:28:11.237 : shelfmgr[389]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/0/CPU0 A9K-40GE-L state:BRINGDOWN

RP/0/RSP0/CPU0:Sep 26 15:28:11.238 : invmgr[255]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/0/CPU0, state: BRINGDOWN

RP/0/RSP0/CPU0:Sep 26 15:30:41.513 : shelfmgr[389]: %PLATFORM-SHELFMGR-3-FSMTIMEOUT_RESET : Node 0/0/CPU0 is reset due to failed bootup. Node state was: 7 Timeout ID: 10

RP/0/RSP0/CPU0:Sep 26 15:30:41.537 : canb-server[150]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/0/CPU0 , Power Cycle (0x05000000) 

I attached log file for node

node 0/2/CPU0

Nicolay.

PS: i did my upgrade follow this link

http://www.cisco.com/web/Cisco_IOS_XR_Software/pdf/ASR9000_Upgrade_Procedure_432.pdf

Cisco Employee

ASR9006 upgrade IOS XR fail

Hi Nicolay,

Sorry for the delay I was on vacation until today.

The following logs are of interest to me, mostly the NP init failure.

lda_server[65]: %L2-SPA-5-STATE_CHANGE : SPA in bay 0 type A9K-MPA-4x10GE Initing  

LC/0/2/CPU0:Sep 25 15:33:37.422 : prm_server_ty[303]: %PLATFORM-NP-0-INIT_ERR : In spite of 3 Cold restarts, NP init unsuccessful...exitting!! 

LC/0/2/CPU0:Sep 25 15:33:38.502 : sysmgr[91]: %OS-SYSMGR-3-ERROR : prm_server_ty(1) (jid 303) exited, will be respawned with a delay (slow-restart)  

LC/0/2/CPU0:Sep 25 15:33:38.501 : sysmgr[91]: prm_server_ty(1) (jid 303) (pid 524413) (fail_count 2) abnormally terminated, restart scheduled

LC/0/2/CPU0:Sep 25 15:33:38.504 : sysmgr[91]: %OS-SYSMGR-3-ERROR : prm_server_ty(303) (fail count 2) will be respawned in 10 seconds 

LC/0/2/CPU0:Sep 25 15:33:38.504 : sysmgr[91]: %OS-SYSMGR-7-DEBUG : prm_server_ty[303] (pid 524413) has not sent proc-ready within 45 seconds 

LC/0/2/CPU0:Sep 25 15:33:48.484 : pifibm_server_lc[292]: %OS-PLATFORM_LPTS_PIFIB-7-ERR_CONN_INIT : Failed to connect to PRM sever: Improper link

LC/0/2/CPU0:Sep 25 15:33:48.655 : sysmgr[91]: %OS-SYSMGR-3-ERROR : inline_service_proc(1) (jid 209) exited, will be respawned with a delay (slow-restart)  

LC/0/2/CPU0:Sep 25 15:33:48.659 : sysmgr[91]: %OS-SYSMGR-3-ERROR : inline_service_proc(209) (fail count 1) will be respawned in 10 seconds 

LC/0/2/CPU0:Sep 25 15:33:48.651 : dumper[56]: %OS-DUMPER-7-DUMP_REQUEST : Dump request for process pkg/bin/pifibm_server_lc

LC/0/2/CPU0:Sep 25 15:33:48.662 : sysmgr[91]: %OS-SYSMGR-7-DEBUG : inline_service_proc(1) (jid 209) did not signal end of initialization  

LC/0/2/CPU0:Sep 25 15:33:48.653 : sysmgr[91]: inline_service_proc(1) (jid 209) (pid 524400) (fail_count 1) abnormally terminated, restart scheduled

LC/0/2/CPU0:Sep 25 15:33:48.727 : pm[294]: %PLATFORM-VKG_PM-3-ERROR_INIT : PM: initialization error encountered, reason=failed to initialize prm stats, pm exits!

LC/0/2/CPU0:Sep 25 15:33:48.941 : sysmgr[91]: pm(1) (jid 294) (pid 524371) (fail_count 1) abnormally terminated, restart scheduled

LC/0/2/CPU0:Sep 25 15:33:48.941 : sysmgr[91]: %OS-SYSMGR-3-ERROR : pm(1) (jid 294) exited, will be respawned with a delay (slow-restart)  

LC/0/2/CPU0:Sep 25 15:33:48.942 : sysmgr[91]: %OS-SYSMGR-3-ERROR : pm(294) (fail count 1) will be respawned in 10 seconds 

LC/0/2/CPU0:Sep 25 15:33:48.998 : fib_mgr[176]: %PLATFORM-PLAT_FIB_HAL-3-ERR_INFO : fib HAL failed to initialize engine hardware :  18  : pkg/bin/fib_mgr : (PID=524398) :  -Traceback= 4db19210 4d8f2b4c 40003f38 40001da4 4ba73a44 4ba71554 400003f0 4000211c 40003078 400000e4 40172470

LC/0/2/CPU0:Sep 25 15:33:49.003 : fib_mgr[176]: %ROUTING-FIB-2-INIT : FIB initialization failed on this node. Reason: Platform init returned hard error. Decoded error reason: Improper link

LC/0/2/CPU0:Sep 25 15:33:49.162 : sysmgr[91]: fib_mgr(1) (jid 176) (pid 524398) (fail_count 1) abnormally terminated, restart scheduled

LC/0/2/CPU0:Sep 25 15:33:49.163 : sysmgr[91]: %OS-SYSMGR-3-ERROR : fib_mgr(1) (jid 176) exited, will be respawned with a delay (slow-restart)  

LC/0/2/CPU0:Sep 25 15:33:49.164 : sysmgr[91]: %OS-SYSMGR-3-ERROR : fib_mgr(176) (fail count 1) will be respawned in 10 seconds 

LC/0/2/CPU0:Sep 25 15:33:49.164 : sysmgr[91]: %OS-SYSMGR-7-DEBUG : fib_mgr(1) (jid 176) did not signal end of initialization  

LC/0/2/CPU0:Sep 25 15:33:49.324 : prm_server_ty[303]: %PLATFORM-NP-0-INIT_ERR : In spite of 3 Cold restarts, NP init unsuccessful...exitting!! 

LC/0/2/CPU0:Sep 25 15:33:49.655 : ipv6_mfwd_partner[245]: %ROUTING-IPV4_MFWD-3-ERR_MLIB_INIT : Failed to initialize Multicast Library Improper link

Can you open a TAC case for this?

This typically indicates a HW failure.

Thanks,

Sam

New Member

Re: ASR9006 upgrade IOS XR fail

Hi Sam,

I have opened TAC case and initiated RMA procedure, but I don`t understand so why this is happened.

Did I need remove line card before upgrade?

Thank you.

Nicolay.

Cisco Employee

ASR9006 upgrade IOS XR fail

Hi Nicolay,

This basically means faulty HW, no faults from anything you did based upon the above logs.

Thanks,

Sam

Bronze

Hi, we had a failure on one

Hi,

 

we had a failure on one LC. Is this a SW or HW failure?

We are running ASR9010 with 4.3.1 and LC is  A9K-8T-L.

Here are the logs:

LC/0/0/CPU0:Jan 15 03:56:40.775 : prm_server_tr[292]: %PLATFORM-NP-4-FAULT : prm_process_parity_tm_cluster: 1 Unrecoverable error(s) found. Reset NP4 now  
LC/0/0/CPU0:Jan 15 03:56:42.858 : ipv4_mfwd_partner[230]: %ROUTING-IPV4_MFWD-4-FROM_MRIB_UPDATE : MFIB couldn't process update from MRIB : failed to create route 0xe0000000:(10.120.3.77,239.192.4.40/32) - 'asr9k-ipmcast' detected the 'warning' condition 'Platform MFIB: Platform Lib not ready; NP Not running' 
LC/0/0/CPU0:Jan 15 03:56:52.185 : pfm_node_lc[282]: %PLATFORM-NP-0-TMB_CLUSTER_PARITY : Set|prm_server_tr[155731]|Network Processor Unit(0x1008004)|TMb cluster parity interrupt. Indicates an internal SRAM problem in TMb cluster, NP=4 memId=6, mask=0x2000000, PMask=0x2000000 SRAMLine=166 Rec=1 Rewr=1 
LC/0/0/CPU0:Jan 15 03:56:52.187 : pfm_node_lc[282]: %PLATFORM-PFM-0-CARD_RESET_REQ : pfm_dev_sm_perform_recovery_action, Card reset requested by: Process ID: 155731 (prm_server_tr), Fault Sev: 0, Target node: 0/0/CPU0, CompId: 0x1f, Device Handle: 0x1008004, CondID: 1008, Fault Reason: TMb cluster parity interrupt. Indicates an internal SRAM problem in TMb cluster, NP=4 memId=6, mask=0x2000000, PMask=0x2000000 SRAMLine=166 Rec=1 Rewr=1 
RP/0/RSP1/CPU0:Jan 15 03:56:52.380 : shelfmgr[394]: %PLATFORM-SHELFMGR-6-NODE_KERNEL_DUMP_EVENT : Node 0/0/CPU0 indicates it is doing a kernel dump. 
RP/0/RSP1/CPU0:Jan 15 03:56:52.381 : shelfmgr[394]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/0/CPU0 A9K-8T-L state:IOS XR FAILURE 
RP/0/RSP1/CPU0:Jan 15 03:56:52.384 : ospf[1011]: %ROUTING-OSPF-5-ADJCHG : Process 8000, Nbr 10.100.96.204 on TenGigE0/0/0/1 in area 0 from FULL to DOWN, Neighbor Down: BFD session down, vrf default vrfid 0x60000000 
RP/0/RSP1/CPU0:Jan 15 03:56:52.397 : shelfmgr[394]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/0/CPU0 A9K-8T-L state:BRINGDOWN 

Cisco Employee

to "close" on this and making

to "close" on this and making sure that it is addressed pasting comments from the other discussion on the same item:

ah this: PLATFORM-NP-4-FAULT : prm_process_parity_tm_cluster: 1 Unrecoverable error(s) found.

it means that the NP number 4 on the linecard in slot 0 incurred a memory parity error on the traffic manager portion of the NPU (the portion that handles Q'ing and scheduling) and it could not correct that error and therefore decided to reinit and crash.

Generally with memory parity errors we always advice to catch it once, monitor it and if this happens again to replace the card.

If you are uncomfortable "waiting" until a next event, you could decide to replace it now, but many times parity errors are transient and caused by a what we used to call "cosmic radiation" which is merely an assembly of uncommon not likely to happen events such as a power spike or drop, or other intangible events.

cheers

xander

Xander Thuijs CCIE #6775 Principal Engineer ASR9000, CRS, NCS6000 & IOS-XR
New Member

Hello All,

Hello All,

I get the following messages and the card A9K-40GE-B  keeps cycling through IOS XR PREP,MBI-BOOTING,MBI-RUNNING.  and it finally  putting it IN_RESET state.Any help is truly appreciated.

0/1/CPU0        A9K-40GE-B                MBI-BOOTING      PWR,NSHUT,MON

RP/0/RSP0/CPU0:Router(admin)#LC/0/1/CPU0:Mar 22 17:31:53.057 : prm_server_tr[305]: %PLATFORM-NP-0-INIT_ERR : (0x8000B002) : Setting up NP0 Failed
LC/0/1/CPU0:Mar 22 17:32:50.031 : pfm_node_lc[293]: %PLATFORM-NP-0-NP_INIT_FAILURE : Set|prm_server_tr[151634]|Network Processor Unit(0x1008000)|Persistent Initialization Failure.
LC/0/1/CPU0:Mar 22 17:32:50.036 : pfm_node_lc[293]: %PLATFORM-PFM-0-CARD_RESET_REQ : pfm_dev_sm_perform_recovery_action, Card reset requested by: Process ID: 151634 (prm_server_tr), Fault Sev: 0, Target node: 0/1/CPU0, CompId: 0x1f, Device Handle: 0x1008000, CondID: 1027, Fault Reason: Persistent Initialization Failure.
--------------------------------------------------------------------------------
RP/0/RSP0/CPU0:Router(admin)#RP/0/RSP0/CPU0:Mar 22 17:42:43.665 : shelfmgr[410]: %PLATFORM-SHELFMGR-0-MAX_BOOTREQ_BRINGDOWN : Node 0/1/CPU0 A9K-40GE-B has reset itself in multiple (11) unsuccessful boot attempts, putting it IN_RESET state. The probable cause is an unexpected event on the node. Please Refer to the Cisco ASR 9000 System Error Message Reference Guide for further information if needed.

RP/0/RSP0/CPU0:Router(admin)#sh ver
Tue Mar 22 17:44:02.089 UTC

Cisco IOS XR Software, Version 5.1.0[Default]
Copyright (c) 2013 by Cisco Systems, Inc.

ROM: System Bootstrap, Version 1.06(20120210:003513) [ASR9K ROMMON],

Router uptime is 55 minutes
System image file is "bootflash:disk0/asr9k-os-mbi-5.1.0/0x100000/mbiasr9k-rp.vm"

cisco ASR9K Series (MPC8641D) processor with 4194304K bytes of memory.
MPC8641D processor at 1333MHz, Revision 2.2
ASR 9006 AC Chassis with PEM Version 2

2 Management Ethernet
219k bytes of non-volatile configuration memory.
975M bytes of compact flash card.
67988M bytes of hard disk.
1605616k bytes of disk0: (Sector size 512 bytes).
1605616k bytes of disk1: (Sector size 512 bytes).

Configuration register on node 0/RSP0/CPU0 is 0x2102

Cisco Employee

Hi!!

Hi!!

the card has an NP init problem, it was trying to set itself up and tests it attached memory and that failed. after a few tries it gave up and put itself in IN-RESET.

you would want to RMA this board and have it replaced.

xander

Xander Thuijs CCIE #6775 Principal Engineer ASR9000, CRS, NCS6000 & IOS-XR
New Member

NP init problem= NO power

NP init problem= NO power initialization  problem ?

I apologize for not knowing this, I'm new with the ASR line.

Thanks for your response.

Cisco Employee

oh, it means an np

oh, it means an np initialization error. when the np boots, it tests its memory for search, stats, tcam and packet buffers, if these fail, it is called an np init error.

from the logs you provided I can't tell which mem failed precisely, but regardless, it can't be repaired or salvaged without a hw replacement.

and oh, if you're new and want to see some more, check out cisco Live ID 2904 from orlando, sanfran and sandiego. Possibly also the brkarc id 2003 for some good stuff on a9k. and of course here on the forums! :)

cheers

xander

Xander Thuijs CCIE #6775 Principal Engineer ASR9000, CRS, NCS6000 & IOS-XR
Bronze

Hi,

Hi,

NP is Network Processor and your Trident based line card has probably four NP's. 

Correct me if I am wrong about NP number.

Cisco Employee

Hi Alexander

Hi Alexander

How i can install add tar from O/RSP1/CPU0 ??

or i need reload RP to install tar file

Bronze

Hi,

Hi,

you want to upgrade to a newer IOS-XR version?

Cisco Employee

yeap i want to upgread from

yeap i want to upgread from the 5.1.3 to 5.3.3 but my files are in the harddisk 0/RSP1/CPU0 but i dont want to do switchover from the RSP0 to RSP1 u know  answer for this ??

regards

Bronze

Easiest way is to install add

Easiest way is to install add from FTP/TFTP server.

I tried to install from TAR once and never again. It takes just too long and you will also get packages that you don't need.

It's better to unzip the tar file and install add individual packages. 

Something like this.

admin ---> install add ftp://10.100.13.133/asr9k-mini-px.pie-4.3.4 ftp://10.100.13.133/asr9k-doc-px.pie-4.3.4 ftp://10.100.13.133/asr9k-fpd-px.pie-4.3.4

Check what packages are active with admin show install active summary and add the same packages. You should also deactivate packages that are not in use.

e.q. asr9k-mcast-px-5.1.3 if you do not use multicast.

You should also check this doc.
http://www.cisco.com/web/Cisco_IOS_XR_Software/pdf/ASR9k_Upgrade_Downgrade_Procedure_IOSXR_Rel_512.pdf

Cisco Employee

i know the process from FTP

i know the process from FTP/TFTP but i want to know if is possible install from the harddisk RSP1 from the RSP0 active. why? My TAR file its in harddisk of RSP1 but i dont know how to call this file in the command install add

regards

Cisco Employee

Just sent you the steps on

Just sent you the steps on your post. Smail Milak is correct, its a little slow on 5.1.3, from 5.3.3 onwards with RSP440 and new gen RSPs, the install add of a TAR should take no more then a few mins, we improved the write speed by about 500%. You need to be on 533 though. 

Eddie.

Bronze

Well it looks like that you

Well it looks like that you can not specify the location (RSP1) from where you want to install add the tar file.

Cisco Employee

Please see my previous post:

Please see my previous post:

https://supportforums.cisco.com/discussion/12961376/upgrade-asr9k-533

Eddie.

New Member

The issue with the speed used

The issue with the speed used to be that the file was transferred twice when doing an install add from a remote filesystem. Don't know if this is still the case with more recent releases.

The way I have always done it is to make my own tar file containing only the PIEs that I want then sftp it to harddisk:/ (only one transfer) then do the install add from the local filesystem. 

If you transfer in-band rather than via MgmtEther you can also greatly speed it up by increasing the LPTS policer limits.

5107
Views
0
Helpful
21
Replies
CreatePlease login to create content