×

警告メッセージ

  • Cisco Support Forums is in Read Only mode while the site is being migrated.
  • Cisco Support Forums is in Read Only mode while the site is being migrated.

ASR9K: admin reload を繰り返し実施すると module が IN-RESET と状態となる

ドキュメント

2017/07/22 - 20:41
7月 22nd, 2017
User Badges:
  • Cisco Employee,


概要

本ドキュメントでは、ASR9000 シリーズルータを使用している場合に、admin reload を繰り返し実施すると module が IN-RESET と状態となる事象の解説と、復旧方法について説明します。


admin reload location <loc> と hw-module location <loc> reload の違い

admin reload location <loc> と hw-module location <loc> reload はどちらもモジュールを reload することが出来るコマンドですが、実際に reload に至るまでの内部的な動作が異なります。

差分のポイントは、モジュールの initialization や operation を担当する shelfmgr process を介しての再起動であるか、そうでないかです。

reload コマンドの場合、その再起動は shelfmgr を介しません。

対して、hw-module reload コマンドの場合は、shelfmgr を通してモジュールの再起動が実施されます。



モジュールの IN-RESET 状態とは

モジュールのステータスは様々な種類がありますが、IN-RESET とは shelfmgr が該当のモジュールを power down に固定した状態です。

障害などでモジュールが再起動を繰り返すような場合に、それを検知し、ネットワークへの影響を最小限とするため、モジュールがそれ以上起動を試みないようにした状態が IN-RESET です。

RP/0/RP0/CPU0:ASR9K#show plat

Node            Type                      State            Config State
-----------------------------------------------------------------------------
0/RP0/CPU0      A99-RP2-TR(Active)        IOS XR RUN       PWR,NSHUT,MON
0/RP1/CPU0      A99-RP2-TR(Standby)       IOS XR RUN       PWR,NSHUT,MON
0/0/CPU0        A9K-MOD400-TR             IOS XR RUN       PWR,NSHUT,MON
0/0/1           A9K-MPA-4X10GE            OK               PWR,NSHUT,MON
0/1/CPU0        A9K-MOD400-TR             IOS XR RUN       PWR,NSHUT,MON
0/1/0           A9K-MPA-2X100GE           OK               PWR,NSHUT,MON
0/9/CPU0        A9K-40GE-SE               IN-RESET         PWR,NSHUT,MON



admin reload コマンドは module を IN-RESET にしてしまう場合がある

上述の通り、admin reload コマンドの場合は、shelfmgr がその再起動処理に介在しません。

その為、shelfmgr から見ると、短い期間での admin reload による reload は IN-RESET への遷移の対象となります。


具体的には、1時間以内に admin reload コマンドで特定のモジュールを5回以上再起動した場合、IN-RESET 状態となります。

後何回の再起動で IN-RESET となるかは、show shelfmgr status コマンドで九人ができます。

RP/0/RP0/CPU0:ASR9K#show shelfmgr status location 0/9/CPU0 
Mon Jun 19 10:21:01.670 JST
Nodeid 0x8b1, inst 1

Platform Node Status for 0/9/CPU0 (0_8b_1)
----------------------------------
Current State: IOS XR RUN
Current Substate: IMDR_STATE_NONE
Configuration:
Power is enabled
Bootup enabled.
Monitoring enabled
Boot Requests: 0 Max Allowed: 10
Bringdown Count: 0 Max Allowed: 5 <<<<<<<
Card Reset Count: 0 Max Allowed: 4

Last Reset Code: 14, CPU Reset
Card In Reset: FALSE Shutdown Reason: 12
No FSM timers are set

Slot supports CBC processor. Card is Present andCBC state is Online
CBC reset reason is 0x35
CBC reports card type as: 0x4302a0
Estimated 275 Watts of power required. Power is Reserved.



RP/0/RP0/CPU0:ASR9K#admin reload location 0/9/CPU0

Preparing system for backup. This may take a few minutes especially for large configurations.
[Done]
RP/0/RP0/CPU0:Jun 19 10:22:33.043 JST: reload[65940]: %MGBL-SCONBKUP-6-INTERNAL_INFO : Reload debug script successfully spawned
Proceed with reload? [confirm]
LC/0/9/CPU0:Jun 19 10:22:33.064 JST: mbi-hello[67]: %PLATFORM-MBI_HELLO-6-NODE_RELOAD : Reload request received. Reloading in 5 secs
LC/0/9/CPU0:Jun 19 10:22:38.068 JST: mbi-hello[67]: %PLATFORM-MBI_HELLO-6-NODE_PWRCYL : Power CycleRequest Received. Power Cycling now !
RP/0/RP1/CPU0:Jun 19 10:22:38.209 JST: canb-server[156]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/9/CPU0 , Power Cycle (0x05000000)
RP/0/RP0/CPU0:Jun 19 10:22:38.213 JST: canb-server[156]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/9/CPU0 , Power Cycle (0x05000000)
RP/0/RP0/CPU0:Jun 19 10:22:38.214 JST: shelfmgr[427]: %PLATFORM-SHELFMGR-6-NODE_CPU_RESET : Node 0/9/CPU0 CPU reset detected.
RP/0/RP0/CPU0:Jun 19 10:22:38.216 JST: shelfmgr[427]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/9/CPU0 A9K-40GE-SE state:BRINGDOWN
RP/0/RP0/CPU0:Jun 19 10:22:38.229 JST: mibd_entity[341]: %HA-HA_EM-7-FMFD_CONNECTION_FAIL : Could not connect to /dev/fm/fd_wdsysmon.d/node0_9_CPU0 : No such file or directory
RP/0/RP0/CPU0:Jun 19 10:22:38.253 JST: invmgr[269]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/9/CPU0, state: BRINGDOWN
RP/0/RP0/CPU0:Jun 19 10:22:39.231 JST: mibd_entity[341]: %HA-HA_EM-7-FMFD_CONNECTION_FAIL : Could not connect to /dev/fm/fd_wdsysmon.d/node0_9_CPU0 : No such file or directory
RP/0/RP0/CPU0:Jun 19 10:22:41.232 JST: mibd_entity[341]: %HA-HA_EM-7-FMFD_CONNECTION_FAIL : Could not connect to /dev/fm/fd_wdsysmon.d/node0_9_CPU0 : No such file or directory
RP/0/RP1/CPU0:Jun 19 10:22:44.232 JST: canb-server[156]: %PLATFORM-CANB_SERVER-7-CBC_POST_RESET_NOTIFICATION : Node 0/9/CPU0 , Power Cycle (0x05000000)
RP/0/RP0/CPU0:Jun 19 10:22:44.237 JST: canb-server[156]: %PLATFORM-CANB_SERVER-7-CBC_POST_RESET_NOTIFICATION : Node 0/9/CPU0 , Power Cycle (0x05000000)
RP/0/RP0/CPU0:Jun 19 10:22:44.239 JST: shelfmgr[427]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/9/CPU0 A9K-40GE-SE state:ROMMON
RP/0/RP0/CPU0:Jun 19 10:22:45.234 JST: mibd_entity[341]: %HA-HA_EM-7-FMFD_CONNECTION_FAIL : Could not connect to /dev/fm/fd_wdsysmon.d/node0_9_CPU0 : No such file or directory

RP/0/RP0/CPU0:ASR9K#show shelfmgr status location 0/9/CPU0
Mon Jun 19 10:23:01.017 JST
Nodeid 0x8b1, inst 1

Platform Node Status for 0/9/CPU0 (0_8b_1)
----------------------------------
Current State: ROMMON
Current Substate: IMDR_STATE_NONE
Configuration:
Power is enabled
Bootup enabled.
Monitoring enabled
Boot Requests: 1 Max Allowed: 10
Bringdown Count: 1 Max Allowed: 5 <<< カウンタが上昇していることがわかります。このカウンタは、1時間毎に1ずつ減っていきます。
Card Reset Count: 0 Max Allowed: 4
The name of timer in codes is PCTL_FSM_CRESET_TIMEOUT_PULSE.
Last Reset Code: 14, CPU Reset
Card In Reset: FALSE Shutdown Reason: 12
Timer BOOTREQ_SANITY 10 set. 134 of 150 seconds remaining.

Slot supports CBC processor. Card is Present andCBC state is Online
CBC reset reason is 0x37
CBC reports card type as: 0x4302a0
Estimated 275 Watts of power required. Power is Reserved.


Heartbeat monitoring is disabled.
Last rx sequence number: 166399 Last tx sequence number: 166399
Missed ticks: 0 Remote HB Missed Count: 0
Last HB TX status: 0

MBI Reset Pending Type: 64
SysMgr Node Band State: 0x800000 FINAL
SysMgr IMDR Node substate: 0

Boot request card type: 0x4302a0. Boot-up was allowed.
Last FSM Shutdown Reason: 1
Rommon Version 3.03

Line card memory mode is mixed. Card mode --BB-------E----------
Same RSPs


admin reload コマンドによって、IN-RESET に遷移した場合の出力例と復旧方法

admin reload による再起動を5回実施した場合の出力例です。

RP/0/RP0/CPU0:ASR9K#show shelfmgr status location 0/9/CPU0
Nodeid 0x8b1, inst 1

Platform Node Status for 0/9/CPU0 (0_8b_1)
----------------------------------
Current State: BRINGDOWN
Current Substate: IMDR_STATE_NONE
Configuration:
Power is enabled
Bootup enabled.
Monitoring enabled
Boot Requests: 3 Max Allowed: 10
Bringdown Count: 4 Max Allowed: 5    <<<< 現在4回実施しています。
Card Reset Count: 0 Max Allowed: 4

Last Reset Code: 14, CPU Reset
Card In Reset: FALSE Shutdown Reason: 12
Timer BOOTREQ_SANITY 10 set. 145 of 150 seconds remaining.

Slot supports CBC processor. Card is Present andCBC state is Online
CBC reset reason is 0x3c
CBC reports card type as: 0x4302a0
Estimated 275 Watts of power required. Power is Reserved.


Heartbeat monitoring is disabled.
Last rx sequence number: 116 Last tx sequence number: 116
Missed ticks: 0 Remote HB Missed Count: 0
Last HB TX status: 0

MBI Reset Pending Type: 64
SysMgr Node Band State: 0x800000 FINAL
SysMgr IMDR Node substate: 0

Boot request card type: 0x4302a0. Boot-up was allowed.
Last FSM Shutdown Reason: 12
Rommon Version 3.03

Line card memory mode is mixed. Card mode --BB-------E----------
Same RSPs

RP/0/RP0/CPU0:ASR9K#admin reload location 0/9/CPU0 <<< 5回目の再起動を実施

Preparing system for backup. This may take a few minutes especially for large configurations.
[Done]
Proceed with reload? [confirm]
RP/0/RP0/CPU0:Jun 19 10:45:54.538 JST: reload[65940]: %MGBL-SCONBKUP-6-INTERNAL_INFO : Reload debug script successfully spawned
LC/0/9/CPU0:Jun 19 10:45:54.554 JST: mbi-hello[67]: %PLATFORM-MBI_HELLO-6-NODE_RELOAD : Reload request received. Reloading in 5 secs
LC/0/9/CPU0:Jun 19 10:45:59.559 JST: mbi-hello[67]: %PLATFORM-MBI_HELLO-6-NODE_PWRCYL : Power Cycle Request Received. Power Cycling now !
RP/0/RP0/CPU0:Jun 19 10:45:59.720 JST: canb-server[156]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/9/CPU0 , Power Cycle (0x05000000)
RP/0/RP0/CPU0:Jun 19 10:45:59.721 JST: shelfmgr[427]: %PLATFORM-SHELFMGR-0-MAX_RESET_BRINGDOWN : Can not boot node 0/9/CPU0 A9K-40GE-SE due to multiple resets, putting it IN_RESET state. The probable cause is an unexpected event on the node or a failure in communication with the node. Please refer to the Cisco ASR 9000 System Error Message Reference Guide for further information if needed.
RP/0/RP0/CPU0:Jun 19 10:45:59.721 JST: shelfmgr[427]: %PLATFORM-SHELFMGR-6-NODE_CPU_RESET : Node 0/9/CPU0 CPU reset detected.
RP/0/RP1/CPU0:Jun 19 10:45:59.724 JST: canb-server[156]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/9/CPU0 , Power Cycle (0x05000000)
RP/0/RP0/CPU0:Jun 19 10:45:59.727 JST: shelfmgr[427]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/9/CPU0 A9K-40GE-SE state:IN-RESET
RP/0/RP0/CPU0:Jun 19 10:45:59.736 JST: mibd_entity[341]: %HA-HA_EM-7-FMFD_CONNECTION_FAIL : Could not connect to /dev/fm/fd_wdsysmon.d/node0_9_CPU0 : No such file or directory
RP/0/RP0/CPU0:ASR9K#RP/0/RP0/CPU0:Jun 19 10:46:00.737 JST: mibd_entity[341]: %HA-HA_EM-7-FMFD_CONNECTION_FAIL : Could not connect to /dev/fm/fd_wdsysmon.d/node0_9_CPU0 : No such file or directory
RP/0/RP0/CPU0:Jun 19 10:46:02.739 JST: mibd_entity[341]: %HA-HA_EM-7-FMFD_CONNECTION_FAIL : Could not connect to /dev/fm/fd_wdsysmon.d/node0_9_CPU0 : No such file or directory

RP/0/RP0/CPU0:ASR9K#
RP/0/RP0/CPU0:ASR9K#
RP/0/RP0/CPU0:ASR9K#show shelfmgr status location 0/9/CPU0
Nodeid 0x8b1, inst 1

Platform Node Status for 0/9/CPU0 (0_8b_1)
----------------------------------
Current State: IN-RESET
Current Substate: IMDR_STATE_NONE
Configuration:
Power is enabled
Bootup enabled.
Monitoring enabled
Boot Requests: 4 Max Allowed: 10
Bringdown Count: 5 Max Allowed: 5
Card Reset Count: 0 Max Allowed: 4

Last Reset Code: 14, CPU Reset
Card In Reset: TRUE Shutdown Reason: 2
No FSM timers are set

Slot supports CBC processor. Card is Present andCBC state is Online
CBC reset reason is 0x3d
CBC reports card type as: 0x4302a0
Estimated 275 Watts of power required. Power is off..


Heartbeat monitoring is disabled.
Last rx sequence number: 193 Last tx sequence number: 193
Missed ticks: 0 Remote HB Missed Count: 0
Last HB TX status: 0

MBI Reset Pending Type: 64
SysMgr Node Band State: 0x800000 FINAL
SysMgr IMDR Node substate: 0

Boot request card type: 0x4302a0. Boot-up was denied.
Last FSM Shutdown Reason: 2
Rommon Version 3.03

Line card memory mode is mixed. Card mode --BB-------E----------
Same RSPs
RP/0/RP0/CPU0:ASR9K#


IN-RESET になってしまった場合、それ以上 admin reload コマンドを実施しても、モジュールは起動してきません。

該当のカウンタをリセットしたり、モジュールを起動させるためには、hw-module location <loc> reload コマンドを実施します。

RP/0/RP0/CPU0:ASR9K#hw-module location 0/9/CPU0 reload
WARNING: This will take the requested node out of service.
Do you wish to continue?[confirm(y/n)]u y

RP/0/RP0/CPU0:Jun 19 10:47:40.907 JST: shelfmgr[427]: %PLATFORM-SHELFMGR-6-USER_RESET : Node 0/9/CPU0 is reset due to user reload request
RP/0/RP0/CPU0:Jun 19 10:47:40.909 JST: shelfmgr[427]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/9/CPU0 A9K-40GE-SE state:IOS XR FAILURE
RP/0/RP0/CPU0:Jun 19 10:47:47.629 JST: canb-server[156]: %PLATFORM-CANB_SERVER-7-CBC_POST_RESET_NOTIFICATION : Node 0/9/CPU0 , Power Cycle (0x05000000)
RP/0/RP0/CPU0:Jun 19 10:47:47.631 JST: shelfmgr[427]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/9/CPU0 A9K-40GE-SE state:ROMMON
RP/0/RP1/CPU0:Jun 19 10:47:47.632 JST: canb-server[156]: %PLATFORM-CANB_SERVER-7-CBC_POST_RESET_NOTIFICATION : Node 0/9/CPU0 , Power Cycle (0x05000000)
RP/0/RP0/CPU0:Jun 19 10:47:47.989 JST: canb-server[156]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/9/CPU0 , Power Cycle (0x05000000)
RP/0/RP0/CPU0:Jun 19 10:47:47.992 JST: shelfmgr[427]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/9/CPU0 A9K-40GE-SE state:BRINGDOWN
RP/0/RP1/CPU0:Jun 19 10:47:47.992 JST: canb-server[156]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/9/CPU0 , Power Cycle (0x05000000)
RP/0/RP0/CPU0:Jun 19 10:47:47.995 JST: invmgr[269]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/9/CPU0, state: BRINGDOWN


RP/0/RP0/CPU0:ASR9K#show shelfmgr status location 0/9/CPU0
Nodeid 0x8b1, inst 1

Platform Node Status for 0/9/CPU0 (0_8b_1)
----------------------------------
Current State: BRINGDOWN
Current Substate: IMDR_STATE_NONE
Configuration:
Power is enabled
Bootup enabled.
Monitoring enabled
Boot Requests: 1 Max Allowed: 10
Bringdown Count: 0 Max Allowed: 5
Card Reset Count: 0 Max Allowed: 4

Last Reset Code: 14, CPU Reset
Card In Reset: FALSE Shutdown Reason: 4
Timer BOOTREQ_SANITY 10 set. 149 of 150 seconds remaining.

Slot supports CBC processor. Card is Present andCBC state is Online
CBC reset reason is 0x40
CBC reports card type as: 0x4302a0
Estimated 275 Watts of power required. Power is Reserved.


Heartbeat monitoring is disabled.
Last rx sequence number: 193 Last tx sequence number: 193
Missed ticks: 0 Remote HB Missed Count: 0
Last HB TX status: 0

MBI Reset Pending Type: 64
SysMgr Node Band State: 0x800000 FINAL
SysMgr IMDR Node substate: 0

Boot request card type: 0x4302a0. Boot-up was allowed.
Last FSM Shutdown Reason: 4
Rommon Version 3.03

Line card memory mode is mixed. Card mode --BB-------E----------
Same RSPs
RP/0/RP0/CPU0:ASR9K#
RP/0/RP0/CPU0:ASR9K#
Loading.

アクション

このドキュメントについて