One chassis, 6 blades, B200-M2
Two Fabric interconnects, 6120XP
Old firmware: 1.3.1n
New firmware: 1.4.3s
When we activate UCS Manager from 1.3 to 1.4, we encountered all the 6 blades reboot suddenly and this does not match what the document said:
We did these steps:
1.) Update Adapter firmware
2.) Activate Adapter firmware one by one (downtime, but we can put esxi hosts in maintenance mode)
3.) Update CIMC firmware
4.) Activate CIMC firmware
5.) Update IO Module firmware
6.) Activate IO module firmware, but "Set Startup Version Only"
7.) Activate UCS Manager firmware <-- the problem occurs
We want to keep the VMs running during the firmware upgrade.
Downtime for one blade/ESXi host is acceptable, but downtime for all the blades are unacceptable for us.
Is there anyone know what causes the blade reboot when activating the UCS Manager firmware? Based on the release doc, only the session to GUI and CLI will be affected.
Thanks a lot and appreciate your help!
Unless I am missing something, you read the wrong doc. You should have read the one from 1.3 to 1.4 and your link says 1.4 to 1.4. Please clarify if what we are reading is correct.
Sent from Cisco Technical Support iPad App
Do we have to upgrade firmware 1.3.1n to 1.4.1 and then 1.4.3?
We now directly upgrade firmware from 1.3.1n to 1.4.3s. Sorry if we miss the information in any document.
So we will try 1.3.1n -> 1.4.1m -> 1.4.3s
Is this correct?
You can upgrade directly from one release to another. Each one has steps to follow. See the link below for each release. You will want to use 1.3 to 1.4 and follow it step by step.
Hope this helps.
I now run into another issue, only one fabric interconnect get new version successfully and the other still run the old version and this results the cluster ip not pingable and the UCS manager not accessible.
Previous firmware: 1.4.3s
New firmware: 1.4.1m
After activate the UCS Manager, from the cli of fabric interconnect:
sdeucs-B# show cluster state
Cluster Id: 0xcfa2f2725b8811xxxxxxxxxx00059b790004
local: 1.4(1m), peer: 1.4(3.0)
B: UP, ELECTION IN PROGRESS (Management services: UP)
A: UP, ELECTION IN PROGRESS (Management services: UNRESPONSIVE)
HA NOT READY
Management services are unresponsive on peer Fabric Interconnect
No device connected to this Fabric Interconnect
only one fabric interconnect downgrade the version successfully and the other not. This causes us lost the connectivitiy and also the management.
any hints here?
There is a full video guide for 1.3x to 1.4x upgrade at the below site.
Updating UCSM certainly should not cause any disruption other than having to restart your user session. Assuming your FI's were correctly clusted and HA was in an operational state prior to the UCSM upgrade.
The first reboot issue was caused by a bug:
Brief info of this bug:
If we upgrade firmware 1.3.x to 1.4.3s directly, we will have blades unexpected reboot when we activate UCS manager.
The work around is to upgrade 1.3.x to 1.4.3r and then to 1.4.3s
The second wierd issue (only FI-B get activated and FI-A always stuck) was caused by corrupted mgmt db issue in FI-A (said by TAC engineer) and we have to rebuild FI-A and the cluter FI-B from scratch (erase all the configuration and init system) to fix the issue. Now it works fine.
By TAC engineer, they can't explain why the corruption happens and how we monitor it.