Understanding Fabric Failure and Failover in UCS

Document

Jan 15, 2015 10:35 AM
Sep 19th, 2011

Introduction

The UCS system has lot of different components which work together. All these components’ working together seamlessly is what makes UCS look inherent. However a failure can happen at any point and this needs to be dealt with. This document tries to cover a broad set of failure situations; although not all situations are covered.

Prerequisites

This document assumes that reader has basic knowledge about UCS components (e.g Fabric Interconnect, Fabric Extender, Chassis) and techniques like NIC teaming and SAN multi-pathing.

Understanding Fabric Failure

In a simple scenario of UCS system with a server with CNA card, following may happen:

a) FI failure : results in fabric failure for all connected UCS chassis

b) FEX failure : results in fabric failure for one UCS chassis

c) FI-FEX link failure : results in fabric failure for some of the servers within a UCS chassis (depending on number of servers and uplinks)

d) One CNA port failure : results in fabric failure for one server

In any of the above cases downtime can be eliminated by using redundant hardware and proper config.

Understanding  Failover

When redundant hardware and proper configuration is in place, any failure will result in failover. The behaviour described below is for end-host mode only, since in switched mode the link status is not propagated.

a)  One uplink of one FI fail : In this case UCS will re-pin the traffic to the remaining uplink to the FI.

b)  Both uplinks of one FI fail or FI fails : In this case the corresponding server links will be shut since there is no uplink available on an FI. The FI will propagate link-down status to the adapter. Once adapter link-down status occurs, it is the responsibility of the operating system to re-pin traffic to the remaining NIC/HBA. The exception here is with Palo adapter (M71KR and M81KR) which supports fabric failover.

c) One uplink of one FEX fails : In this case the server blades pinned to the failed uplink will have the links shut. Although this applies only to UCS not having the new hardware FEX & FI, running  1.x or 2.x.

d) Both uplinks of one FEX fail or FEX fails : In this case all adapters on that fabric will lose network/storage connectivity. If host level redundancy is configured (NIC teaming and SAN multi-pathing) the traffic will be re-routed trough the other FEX.

e) One adapter fails : If this is the only adapter then connectivity will be lost. If a redundant adapter is available and host level redundancy is configured, the traffic will be re-routed through the other adapter. Some UCS adapters like M71KR and M81KR support fabric failover at adapter level, thus eliminating the need of host level redundancy configuration (NIC teaming). As in case of NIC teaming, this will detect any failure between the adapter and the FI uplink. However, SAN fabric design considerations must be considered for vHBA failover. In most situations it is discouraged to have vHBA fabric failover.

Related Information

How does UCS manager High Availability architecture works

UCS with a single fabric interconnect vs. dual fabric interconnect topology

Overall Rating: 5 (1 ratings)
Dave Compton Thu, 01/15/2015 - 07:43

How about in the scenario when you loose connectivity between FI - A and the WAN (Upstream switch fails). FI - B still has connectivity to the WAN and the CNA is setup for Fabric A with failover enabled for Fabric B.

Will UCS see the upstream outage and failover to fabric B in this scenario? Or does failover only work within UCS?

Thanks!

Keny Perez Thu, 01/15/2015 - 09:10

The server vNICs are pinned to uplink port that have to be up/up; if an specific uplink port fails, the vNIC detects the failure, unpins itself from that port and tries another uplink and eventually fails over to the other FI when all uplinks  go down(in case of dynamic pinning) or fails over to the other FI immediately (in case of LAN Pin Groups).

Bottom line, the vNIC can also fail over if the uplink it is pinned to goes down.

 

-Kenny

 

Dave Compton Thu, 01/15/2015 - 10:32

Keny,

Thanks for your response!

 

Keny Perez Thu, 01/15/2015 - 10:35

Anytime :)

Actions

Login or Register to take actions

This Document

Posted September 19, 2011 at 1:26 PM
Updated September 25, 2011 at 11:15 PM
Stats:
Comments:4 Overall Rating:5
Views:8205 Contributors:2
Shares:3
Categories: General UCS Hardware
+

Related Content