The UCS system has lot of different components which work together. All these components’ working together seamlessly is what makes UCS look inherent. However a failure can happen at any point and this needs to be dealt with. This document tries to cover a broad set of failure situations; although not all situations are covered.
This document assumes that reader has basic knowledge about UCS components (e.g Fabric Interconnect, Fabric Extender, Chassis) and techniques like NIC teaming and SAN multi-pathing.
Understanding Fabric Failure
In a simple scenario of UCS system with a server with CNA card, following may happen:
a) FI failure : results in fabric failure for all connected UCS chassis
b) FEX failure : results in fabric failure for one UCS chassis
c) FI-FEX link failure : results in fabric failure for some of the servers within a UCS chassis (depending on number of servers and uplinks)
d) One CNA port failure : results in fabric failure for one server
In any of the above cases downtime can be eliminated by using redundant hardware and proper config.
When redundant hardware and proper configuration is in place, any failure will result in failover. The behaviour described below is for end-host mode only, since in switched mode the link status is not propagated.
a) One uplink of one FI fail : In this case UCS will re-pin the traffic to the remaining uplink to the FI.
b) Both uplinks of one FI fail or FI fails : In this case the corresponding server links will be shut since there is no uplink available on an FI. The FI will propagate link-down status to the adapter. Once adapter link-down status occurs, it is the responsibility of the operating system to re-pin traffic to the remaining NIC/HBA. The exception here is with Palo adapter (M71KR and M81KR) which supports fabric failover.
c) One uplink of one FEX fails : In this case the server blades pinned to the failed uplink will have the links shut. Although this applies only to UCS not having the new hardware FEX & FI, running 1.x or 2.x.
d) Both uplinks of one FEX fail or FEX fails : In this case all adapters on that fabric will lose network/storage connectivity. If host level redundancy is configured (NIC teaming and SAN multi-pathing) the traffic will be re-routed trough the other FEX.
e) One adapter fails : If this is the only adapter then connectivity will be lost. If a redundant adapter is available and host level redundancy is configured, the traffic will be re-routed through the other adapter. Some UCS adapters like M71KR and M81KR support fabric failover at adapter level, thus eliminating the need of host level redundancy configuration (NIC teaming). As in case of NIC teaming, this will detect any failure between the adapter and the FI uplink. However, SAN fabric design considerations must be considered for vHBA failover. In most situations it is discouraged to have vHBA fabric failover.