Cisco Support Community
cancel
Showing results for 
Search instead for 
Did you mean: 
cancel

Cisco Nexus 9508 ACI-mode switch - Redundancy\Standby Support with 2 supervisor modules

 

This Tech Zone Article is to address some Frequently Asked Questions in regards to dual supervisors and hot standby feature for the Nexus 9508 series switch running in ACI mode.

 

  • Are dual supervisors supported in ACI on the 9508 for HA?
  • Will the spine fail over to standby supervisor if primary supervisor fails? 
  • Is the standby supervisor a HOT standby or a COLD standby supervisor?
  • Will the primary supervisor copy the ACI firmware and CERT files to standby supervisor?
  • Does the standby supervisor require it's own Certificate(CERT) files?
  • What does "warm standby" mean in the output of "show system redundancy status"


Different redundancy modes for Cisco Devices:

HOT STANDBY
Hot redundancy refers to a degree of resiliency where the redundant system is fully prepared to handle the traffic of the primary system. Substantial state information is saved, so the network service is continuous, and the effect on traffic flow is minimal or nil in the case of a failover.

 

WARM STANDBY
Warm redundancy refers to a degree of resiliency beyond the cold standby system. In this case, the redundant system is partially prepared. However, the system does not have all the state information that the primary system knows for an immediate take-over. Some additional information must be determined or gleaned from the traffic flow or the peer network devices to handle packet forwarding.

 

COLD STANDBY
Cold redundancy refers to the degree of resiliency that a redundant system traditionally provides. A redundant system is cold when no state information is maintained between the backup or standby system and the system it protects.


The Release Notes below mentions and a "show system redundancy status" the state of redundancy as "Warm" and "Warm Standby". This can cause some confusion if you learned and understand the 3 modes listed above. As of today, The Cisco Nexus 9508 ACI-mode switch supports "COLD STANDBY". The only mirrored items between the active and standby supervisors are: aci firmware image and certificate files. Both of these need to installed independent of each other. There is no auto-synchronization or state information exchange. A CDET was filed for this (CSCuq18178) to change the wording to "Cold Standby". The Development Team decided not to change and use the following documentation to explain the current supported redundancy for the Cisco Nexus 9508 ACI-mode switch.


Information from Release Notes:
The Cisco Nexus 9508 ACI-mode switch supports warm (stateless) standby where the state is not synched between the active and the standby supervisor modules. For an online insertion and removal (OIR) or reload of the active supervisor module, the standby supervisor module becomes active, but all modules in the switch are reset because the switchover is stateless. In the output of the show system redundancy status command, warm standby indicates stateless mode.


ACTIVE SUPERVISOR
 

spine1# show system redundancy status
Redundancy mode
---------------
administrative: Warm
operational: Warm

This supervisor (sup-27)
-----------------------
Redundancy state: Active
Supervisor state: Active
Internal state: Active with warm standby

Other supervisor (sup-28)
------------------------
Redundancy state: Standby
Supervisor state: Warm standby
Internal state: Warm standby


STANDBY SUPERVISOR

 

(none)# show system redundancy status
Redundancy mode
---------------
administrative: Warm
operational: Warm

 


Reference Material:

 

Cisco NX-OS Release 11.0(1d) Release Notes for Cisco Nexus 9000 Series ACI-Mode Switches
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/1-x/release/notes/aci_nxos_rn_1101d.html

 

CSCuq18178 - show system redundancy status terminology incorrect 

 

 

The following example Output shows a N9508 with two supervisors in ACI Mode.  I have provided and example of that shows the two supervisors are NOT synchronized with Certificate files. This is key to note.  Why?  Because if the standby supervisor does NOT have a valid CERT file, it will not join or communicate correctly with the Fabric and in the output of the "acidiag fnvread" command could show the status as "DISCOVERING" or "INACTIVE" for the state of the spine that has failed over.  I show this example to assist in problem determination if this occurs.

 


ACTIVE SUPERVISOR
 

spine1# show system redundancy status
Redundancy mode
---------------
administrative: Warm
operational: Warm

This supervisor (sup-27)
-----------------------
Redundancy state: Active
Supervisor state: Active
Internal state: Active with warm standby

Other supervisor (sup-28)
------------------------
Redundancy state: Standby
Supervisor state: Warm standby
Internal state: Warm standby


(none)# dir /bootflash
aci-n9000-dk9.11.0.1c.bin
auto-s 
disk_log.txt 
mem_log.txt
mem_log.txt.old.gz


spine1# show version
Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac

Software
BIOS: version 07.08
kickstart: version 11.0(1c) [build 11.0(1c)]
system: version 11.0(1c) [build 11.0(1c)]
BIOS compile time: 03/28/2014
kickstart image file is: /bootflash/aci-n9000-dk9.11.0.1c.bin
kickstart compile time: 09/03/2014 05:48:50 [09/03/2014 05:48:50]
system image file is: /bootflash/auto-s
system compile time: 09/03/2014 05:48:50 [09/03/2014 05:48:50]


Hardware
cisco N9K-SUP-A ("supervisor")
Intel(R) Xeon(R) CPU E5-2403 0 @ 1.80GHz with 16400384 kB of memory.
Processor Board ID FGE18200AVQ

Device name: spine1
bootflash: 62522368 kB

 

spine1# cat /mit/sys/summary
# System
address : 192.168.0.220
childAction :
currentTime : 2014-10-17T12:09:35.712+00:00
dn : sys
fabricId : 1
fabricMAC : 00:22:BD:F8:19:FF
id : 201
inbMgmtAddr : 0.0.0.0
lcOwn : local
modTs : 2014-10-17T10:00:28.990+00:00
mode : unspecified
monPolDn : uni/fabric/monfab-default
name : spine1
oobMgmtAddr : 0.0.0.0
podId : 1
rn : sys
role : spine
serial : FGE18200AVQ
state : in-service
status :
systemUpTime : 00:02:13:42.000


spine1# cat /proc/cmdline
console=ttyS0,9600n8nn card_index=21000 loader_ver="7.08" quiet ksimg=bootflash:aci-n9000-dk9.11.0.1c.bin rw root=/dev/ram0 rdbase=0x8000000 ip=off ramdisk_size=131072 kgdboc=ttyS0,115200,B mtdparts=physmap-flash.0:512k(mtdoops),256k(RR),256k(SM_LOG),512k(KLOG),512k(EXTRA),12m(KTRACES),50m(PLOG) elevator=noop intel_idle.max_cstate=2 pcie_ports=native

 


spine1# cat /mnt/cfg/0/boot/grub/menu.lst.local
#
# General configuration
#
disable certificate
title bootflash:aci-n9000-dk9.11.0.1c.bin
boot bootflash:aci-n9000-dk9.11.0.1c.bin

 

spine1# cat /mnt/cfg/1/boot/grub/menu.lst.local
#
# General configuration
#
disable certificate
title bootflash:aci-n9000-dk9.11.0.1c.bin
boot bootflash:aci-n9000-dk9.11.0.1c.bin

 

 

CERTIFICATE VERIFICATION (Valid CERT File)

 

spine1# whoami
root

 

spine1# openssl asn1parse < /securedata/ssl/server.crt | grep PRINTABLESTRING
WARNING: can't open config file: /usr/lib/ssl/openssl.cnf
51:d=5 hl=2 l= 13 prim: PRINTABLESTRING :Cisco Systems
75:d=5 hl=2 l= 22 prim: PRINTABLESTRING :Cisco Manufacturing CA
142:d=5 hl=2 l= 28 prim: PRINTABLESTRING :PID:N9K-C9508 SN:FGE18200AVQ
181:d=5 hl=2 l= 11 prim: PRINTABLESTRING :FGE18200AVQ

 

 

INITITATE FAILOVER:

spine1# reload

This command will reload the chassis, Proceed (y/n)? [n]: y
[ 9891.651189] nvram_klm wrote rr=9 rr_str=PolicyElem Ch reload to nvramspine1#
[ 9891.726160] obfl_klm writing reset reason 9, switch reset
[ 9891.806345] Collected 8 ext4 filesystems
[ 9891.854046] Freezing filesystems
[ 9891.973546] Collected 1 ubi filesystems
[ 9892.020222] Freezing filesystems
[ 9892.060810] Done freezing filesystems
[ 9892.106536] Putting SSD in stdby
[ 9892.653211] Done putting SSD in stdby 0
[ 9892.699876] Done offlining SSD

INSIEME SPINE Ver 7.8

INSIEME SPINE Ver 7.8
Memory Size (Bytes): 0x0000000080000000 + 0x0000000380000000
Relocated to memory
Detected CISCO IOFPGA
Code Signing Results: 0x0
Using Upgrade FPGA
Booting from Primary Bios
FPGA Revison : 0x20
FPGA ID : 0x1168153
FPGA Date : 0x20140317
Reset Cause Register: 0x20
Boot Ctrl Register : 0x60ff
EventLog Register1 : 0x2000000
EventLog Register2 : 0xfbc77fff
Found Grub
Version 2.15.1236. Copyright (C) 2012 American Megatrends, Inc.
Board type 1
IOFPGA @ 0xe8000000
SLOT_ID @ 0x1b
Filesystem type is ext2fs, partition type 0x83
Trying to read config file /boot/grub/menu.lst.local from (hd0,4)
Filesystem type is ext2fs, partition type 0x83

Booting bootflash:aci-n9000-dk9.11.0.1c.bin...
Booting bootflash:aci-n9000-dk9.11.0.1c.bin
Trying diskboot
Filesystem type is ext2fs, partition type 0x83
Image valid

 

 

########################################################################

 


STANDBY SUPERVISOR

(none)# show system redundancy status
Redundancy mode
---------------
administrative: Warm
operational: Warm

list index out of range
Error executing command, check logs for details


spine1# dir /bootflash
aci-n9000-dk9.11.0.1c.bin
auto-s 
disk_log.txt 
mem_log.txt
mem_log.txt.old.gz


(none)# show version
Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac

Software
BIOS: version Unknown
kickstart: Unknown
system: Unknown
BIOS compile time: 12/25/2020
kickstart image file is: Unknown
kickstart compile time: 12/25/2020 12:00:00 [12/25/2020 12:00:00]
system image file is: Unknown
system compile time: 12/25/2020 12:00:00 [12/25/2020 12:00:00]


Hardware
cisco Unknown ("supervisor")
Unknown CPU with 0 kB of memory.
Processor Board ID Unknown

Device name: none
bootflash: 0 kB

 

(none)# cat /mit/sys/summary
# System
address : 0.0.0.0
childAction :
currentTime : 2014-10-18T03:07:06.041+00:00
dn : sys
fabricId : 1
fabricMAC : 00:22:BD:F8:19:FF
id : 0
inbMgmtAddr : 0.0.0.0
lcOwn : local
modTs : 2014-10-18T00:54:01.994+00:00
mode : unspecified
monPolDn : uni/fabric/monfab-default
name :
oobMgmtAddr : 0.0.0.0
podId : 1
rn : sys
role : unsupported
serial :
state : out-of-service
status :
systemUpTime : 00:02:14:04.000


(none)# cat /proc/cmdline
console=ttyS0,9600n8nn card_index=21000 loader_ver="7.08" quiet ksimg=bootflash:aci-n9000-dk9.11.0.1c.bin rw root=/dev/ram0 rdbase=0x8000000 ip=off ramdisk_size=131072 kgdboc=ttyS0,115200,B mtdparts=physmap-flash.0:512k(mtdoops),256k(RR),256k(SM_LOG),512k(KLOG),512k(EXTRA),12m(KTRACES),50m(PLOG) elevator=noop intel_idle.max_cstate=2 pcie_ports=native


(none)# cat /mnt/cfg/0/boot/grub/menu.lst.local
#
# General configuration
#
disable certificate
title bootflash:aci-n9000-dk9.11.0.1c.bin
boot bootflash:aci-n9000-dk9.11.0.1c.bin

 

(none)# cat /mnt/cfg/1/boot/grub/menu.lst.local
#
# General configuration
#
disable certificate
title bootflash:aci-n9000-dk9.11.0.1c.bin
boot bootflash:aci-n9000-dk9.11.0.1c.bin

 


CERTIFICATE VERIFICATION (Invalid CERT File)

 

(none)# openssl asn1parse < /securedata/ssl/server.crt | grep PRINTABLESTRING
WARNING: can't open config file: /usr/lib/ssl/openssl.cnf
37:d=5 hl=2 l= 2 prim: PRINTABLESTRING :XX
137:d=5 hl=2 l= 2 prim: PRINTABLESTRING :US


(none)# openssl asn1parse < /securedata/ssl/server.crt | grep UTF8STRING
WARNING: can't open config file: /usr/lib/ssl/openssl.cnf
50:d=5 hl=2 l= 12 prim: UTF8STRING :Default City
73:d=5 hl=2 l= 19 prim: UTF8STRING :Default Company Ltd
150:d=5 hl=2 l= 2 prim: UTF8STRING :CA
163:d=5 hl=2 l= 7 prim: UTF8STRING :SanJose
181:d=5 hl=2 l= 16 prim: UTF8STRING :Insieme Networks
208:d=5 hl=2 l= 7 prim: UTF8STRING :Insieme


Note: You will need to add valid CERT file to the standby supervisor.

 

 

ACTIVE SUPERVISOR INITITATED FAILOVER WITH RELOAD:

(none)#
INSIEME SPINE Ver 7.8

INSIEME SPINE Ver 7.8
Memory Size (Bytes): 0x0000000080000000 + 0x0000000380000000
Relocated to memory
Detected CISCO IOFPGA
Code Signing Results: 0x0
Using Upgrade FPGA
Booting from Primary Bios
FPGA Revison : 0x20
FPGA ID : 0x1168153
FPGA Date : 0x20140317
Reset Cause Register: 0x80000022
Boot Ctrl Register : 0x60ff
EventLog Register1 : 0x2000000
EventLog Register2 : 0xfbc77fff
Found Grub
Version 2.15.1236. Copyright (C) 2012 American Megatrends, Inc.
Board type 1
IOFPGA @ 0xe8000000
SLOT_ID @ 0x1c
Filesystem type is ext2fs, partition type 0x83
Trying to read config file /boot/grub/menu.lst.local from (hd0,4)
Filesystem type is ext2fs, partition type 0x83

Booting bootflash:aci-n9000-dk9.11.0.1c.bin...
Booting bootflash:aci-n9000-dk9.11.0.1c.bin
Trying diskboot
Filesystem type is ext2fs, partition type 0x83
Image valid

 
Comments
Community Member

Hi Expert 

How do you that ?

"Note: You will need to add valid CERT file to the standby supervisor."

I have dual supervisor and need to get up the standby to "hot" by moving over  Cert !

/Ola

Cisco Employee

Hey Ola,

Hot Standby is supported in NXOS mode, whereas if you are running in ACI mode a Warm Standy is the only option. Tomas already included that in his post above, but I went ahead and pulled the important part.

Information from Release Notes:
The Cisco Nexus 9508 ACI-mode switch supports warm (stateless) standby where the state is not synched between the active and the standby supervisor modules. For an online insertion and removal (OIR) or reload of the active supervisor module, the standby supervisor module becomes active, but all modules in the switch are reset because the switchover is stateless. In the output of the show system redundancy status command, warm standby indicates stateless mode.

1997
Views
10
Helpful
2
Comments