In last night's mainteance window, I was going to change one of my UCS systems from FC Switching Mode to FC End-Host mode. The intention was to be able to reach the storage systems through an external FC SAN, instead of the current situation where the storage is directly attached to the FIs. The UCS system in question has redundant 6248UP Fabric Interconnects, blades are all B200M2 or B200M3, and the infrastructure (UCSM, FIs, and IOMs) are running version 2.2(3a).
The plan was relatively straight forward. First I'd change FI B to FC End-Host mode. After that was done I would re-register the blades' vHBAs attached to the B fabric in my storage systems, so that they could access their storage via the external SAN. Then I'd go through all the blade servers and ensure that they all had healthy and fault-tolerant multipath storage access, at which point I would repeat the process for FI A.
It turned out to be a disaster, however. Once I started the process of changing FI B to FC End-Host mode, both FIs rebooted, and all the servers lost both storage and network access. When they came up again, both of them had changed to FC End-Host mode, which meant that none of the blades had access to their storage volumes any longer. So there was nothing to do but to revert the change. Again, both FIs rebooted simultaneously, and they both came back up in FC Switching mode. A few hours later, after rebooting most of the blade servers and cleaning up failed applications, we were back in normal production.
So what I'm wondering about here is why on Earth both of my FIs rebooted at the same time. Is this a known bug? The user interface certainly would have one think one could do one FI at a time like I planned, as the Set FC End-Host Mode action link is found on each individual FI's page in UCSM (under Equipment->Fabric Interconnects->Fabric Interconnect A/B). (So does the Set FC Switching Mode action link, for that matter.) Furthermore, when I click on this link, I get a warning pop-up saying «Are you sure you want to set FC End-Host mode? If you select Yes the Fabric Interconnect will restart and this application will disconnect». Note the use of singular here; the warning relates to a single Fabric Interconnect, which makes perfect sense as the change was only requested on one of the two FIs.
Solved! Go to Solution.
Sorry, but this a well know feature ;-) and was discussed in this community before, see e.g.
The documentation says:
When you change the Fibre Channel switching mode, Cisco UCS Manager logs you out and restarts the fabric interconnect. For a cluster configuration, Cisco UCS Manager restarts both fabric interconnects sequentially. The second fabric interconnect can take several minutes to complete the change in Fibre Channel switching mode and become system ready.
Thanks for your reply. The thing is, my FIs restarted simultaneously. It didn't happen sequentially, as the documentation you're quoting suggests. That's indeed the crux of the issue, because had the FI reboots happened sequentually (similar to a infrastructure firmware auto install procedure, say), everything would have worked out just fine. But when both the A and B fabrics go offline simultaneously...you lose no matter what. :-(
«In my opinion, the documenation needs to be corrected»
Fully agreed. Not only the documentation, though, it's even more important that the UCSM user interface is updated. If the «Are you sure..» warning pop-up had as much as hinted at the fact that the change would impact both FIs, not just the one I clicked the Set FC End-Host Mode link from, I'd never had gone forward with it. :-(
I know this is an old post, but I wanted to find out the end result of your migration. We are planning on moving from DAS to a brocade switch, I understand that the when we change the FC switch mode the Fabrics will reboot. I noticed you reverted back but did you ever complete the migration?
I did, but I had to schedule a (disruptive) maintenance window. If I recall correctly the downtime wasn't too long (maybe 10-15 minutes or so), but we had to stop all I/O and shut down all the service profiles that were booting off the SAN, so it wasn't exactly fun.
Thanks for your response. We are currently using DAS to a CX4-120 for storage, but need to migrate off that to a new Storage solution. We will be moving the storage connection to a brocade switch connecting the CX4-120 to it as well to migrate the storage to the new SAN. From what I understand this going to have to be a hot cut, change the Fabric Interconnects to End Host Mode, remove the zoning from the VSAN interfaces connecting to the brocade. Zoning will be accomplished on the brocade to get the UCS hba to the CX4-120. We are expecting this to be an extended maintenance event. Is my thought process correct? Anything specific we need to consider prior to the migration?
While it's been quite a while ago we did this work, so I might be forgetting details, it does seem like you've got it covered. That said, you'll need to ensure that all the initiators will be zoned to the exact same SP port on the CX4, since its initiator records are tied to specific SP ports.