On upgrading from v2.1(1e) to 2.2(1b) we have seen a similar issue on two different clusters.
After activating the new image on the subordinate FI and attempting to switch the primary role to the subordinate, a 'show cluster state' reports that the primary is unable to communicate with the UCSM controller.
ucs-A# show fabric-interconnect version
Fabric Interconnect A:
Fabric Interconnect B:
ucs-A# show system version
ucs-A# show cluster state
Cluster Id: 65030500-7707-11e0-87e4-000573d2eec4
Unable to communicate with UCSM controller
Has anyone seen similar issues in upgrading to v2.2(1b).
I assume you did a manual install (not infrastructure autoinstall) ?
Is your UCSM already upgraded and running 2.2.1b ? If not, this might be your problem ?
Yes, this was a manual upgrade.
As you can see from the output in my original post, the UCSM is upgraded to 2.2(1b) (from 2.2(1e)), and the subordinate FI has been upgraded to 2.2(1b), however on attempting to force the subordinate to lead so that the primary FI can be upgraded, running a 'show cluster state', or 'force lead b' on the primary returns an error.
I've been advised that activating the newer kernel and system image on the primary will cause a management state failover, however it seems to go against all best practices outlined in the upgrade guide, especially those pertaining to confirmation of HA state prior to proceeding with FI upgrades.
To have the primary FI report that it is unable to communicate with the UCSM controller seems to me to be a fairly serious issue.
Questions / comments
- is this a production environment ?
- very strange, that you see this issue in 2 UCS domains ?
- what kind of other error messages do you see; eg. are the IOM's all ok ?
- I would anyway advise to use "autoinstall feature"
- I have seen tons of transient error messages during upgrades from 2.0.x / 2.1.x to 2.2.1b, which disappeared at the end
- force the subordinate to lead so that the primary FI can be upgraded is optional; it will be done automatically if you upgrade the master FI ! there is no benefit at all; you loose your UCSM session anyway !
- We have seen this happen in both a test and production environment.
- We have seen this happen in two domains.
- All other components are OK.
- We are somewhat hesitant to use the autoinstall feature as we want to avoid any possibility of unscheduled reboots.
- My interpretation of the upgrade documentation is that forcing the subordinate to lead prior to upgrading the master is best practice and should be done to confirm HA as a preliminary step to upgrading the master. I have no idea how the autoinstall feature is implemented internally, but one may well suspect it confirms HA status/cluster state on the master via a similar process prior to upgrading the FI.
I understand that the master role should (and almost certainly will) fail-over if the master is rebooted, but the inability to query the cluster state from the master FI, and relying on hard failover during an upgrade, still seems like a glaring issue to me.
- we have seen issues upgrading to 2.2.1b; almost 100% of those were manual
- infrastructure auto install order is:
1) UCSM (you loose your UCSM session)
2) IOM upgrade and activation with flag set
3) FI subordinate upgrade (with reboot)
4) the install stops !!!!! you have to check if both fabrics are properly up and running, and cluster state is HA
5) you ACK to go on with the upgrade of the FI master (you loose the UCSM session a second time)
- ....should be done to confirm HA as a preliminary step to upgrading the master..... Agreed see 4) above ! and has nothing to do with ....force the subordinate to lead so that the primary FI.....
- ....but the inability to query the cluster state from the master FI.... ?? you can have a ssh session to either FI, indep. of master or subordinate, and do "show cluster status"
Here's a blog post with more on this issue:
I ran into this problem on my UCS as well when upgrading to 2.2(1b). In my case, I tried to roll back the FI to the previous version. This was unsuccessful and led to the FI crashing to a bash prompt, so don't do that.
Once I got that fixed, the only available option was to proceed with the upgrade on the Primary. Quite nerve-racking in a production environment. Fortunately the HA worked fine and the primary upgraded successfully.
My All Dear friends,
Could you please help me, I wan to join Cisco Data center classes.I have got new job on Data center but i don't have good command of Nexus.
Could you please help me to find best Nexus trainer.
#Thank you so much#