UCS manager runs on UCS 6100 series fabric interconnects and is used to manage and configure the entire UCS system. The UCS manager High Availability (HA) architecture works when two UCS 6100 series fabric interconnects are connected as cluster peers. In such a case two instances of UCS manager are running on two fabric interconnects. The UCS manager instances communicate over the dual cluster links between the two fabric interconnects.
The UCS manager uses active/standby architecture, in which the active instance is called primary and the standby instance is called subordinate. All communication with the external world is handled by the primary instance which maintains the main configuration database. The main configuration database is stored on the primary and replicated on the subordinate. The primary sends updates to the subordinate when configuration changes occur.
High Availability architecture details
Both of the fabric interconnects have private static IP address configured. However they share a virtual IP address and this IP address is always associated with the fabric interconnect running the primary instance of UCS manager. The two UCS manager instances keep themselves aware of each other by heart-beat message exchanges. When both of the fabric interconnects in cluster are running, the copies of configuration database on them are kept in sync.
How It Works
The two instances of Cisco UCS Manager communicate across a private network between the L1 and L2 Ethernet ports on the fabric interconnects. Configuration and status information is communicated across this private network to ensure that all management information is replicated. This ongoing communication ensures that the management information for Cisco UCS persists even if the primary fabric interconnect fails. In addition, the "floating" management IP address that runs on the primary Cisco UCS Manager ensures a smooth transition in the event of a failover to the subordinate fabric interconnect.
In the situation when the fabric interconnect running primary instance fails, the subordinate fabric interconnect takes the role of primary. If some configuration changes are made on this fabric interconnect they will not get updated on the failed fabric interconnect. If for some reason this fabric interconnect also fails and the first failed fabric interconnect comes up, it will have the old copy of the configuration database. In such cases the active fabric interconnect will first check the configuration database version number of the other fabric interconnect stored in the blade chassis in a shared area. If the other fabric interconnect’s configuration database version number is higher it means that the other fabric interconnect has a more recent configuration. The active fabric interconnect stops loading and waits for administrator’s input.
Similarly in the case when the communication links between two fabric interconnects fail, the fabric interconnects use the blade chassis for communicating the heart-beat. If the fabric interconnects receive proper heart-beat then the primary stays as primary and the subordinate stays as subordinate and the system enters Link-Failed state. However in this scenario only heart-beats can be exchanged and any update to the configuration database will not get reflected on the subordinate.