Hi Tony, you are absolutely

Dragomir · ‎03-29-2014

I am about to upgrade from 2.1(1a) to 2.2(1c). I only plan to do the Fabric interconnects first. Then the blades later. Do I have to do everything together if I use the auto install feature?

richbarb · ‎03-29-2014

Hello Tony,

The only supported cross firmware version with 2.2 is the 2.1(3). In your case, you must perform a complete upgrade system.

If you leave the servers with old firmwares, maybe you don't have any problem, but keep in mind this is not supported.

http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/sw/upgrading/from2-1/to2-2/b_UpgradingCiscoUCSFrom2-1To2-2.html#concept_F3FC4E0BB15F437C989574FDAC3603F6

Regards.

Richard

Dragomir · ‎03-30-2014

So far I did only ucsm to 2.2(1c). FIs are next but I cannot schedule a maintenance window. However they should be fully redudant since we have 2 FIs going out to different cores switches.

Is it ok to do it?

richbarb · ‎03-30-2014

Hey Tony,

You should not have problem, but is highly recommended you schedule a maintenence window to do that, depending of the application running on it and the UCSM configuration, this can't be so simple.

By my experience with vmware are fine, but others things I faced some issues.

Anyway, be sure that you both paths to storage are working well and all vnics are with failover enabled or you are running a different mechanism to handle uplink failures.

Richard.

Dragomir · ‎03-30-2014

Hi

Enabling failover do you mean this? I noticed I do not have any of these enabled on my vnics. Also the vhba templates I did not see any failover setting.

richbarb · ‎03-30-2014

Hi,

The fabric failover feature allows the vnic not be turned off if the fabric goes down, the traffic will be redirect to the other fabric.

For vhba this feature is not avaliable, you have to install the multipath mechanism in your operating system to keep the comunication with the SAN.

Even with all these resources enabled and running, is recommended perform a UCS system upgrade with a maintenance window.

Richard

Dragomir · ‎03-30-2014

I have been reading that the enable failovre box shuld not be enabled if running on ESXi since ESXi does its own nic path failover.

Is this true? Is there a test I could do like shutdown the vethernet interface on one blade server and see if the vms on that blade will be failed over to the other FI?

Thank you

richbarb · ‎03-30-2014

That's definitely not true.

The only advantage to use port-channel on this scenario is that you can lose one leg (fabric) and keep the traffic flowing through the other leg.

But if you could never lose connectivity? this is what the UCS failover do for you, you don't need configure port-channel for High-availability. The "fabric failover" will never leave that vnic lose conectivity while some uplink is operational (that vnic can not be static pinned with a pin-group).

Personally, I prefer use the UCS failover mechanism instead port-channel, especially because the port-channel must be active-standby, the only method supported by the UCS, unless if you are using Nexus 1000v. So, you don't have the advantage to aggregate links using port-channels in the UCS.

This link might help:
http://bradhedlund.com/2010/09/23/cisco-ucs-fabric-failover/

Richard

Walter Dey · ‎03-31-2014

Hi Tony, you are absolutely right, regarding Failover Flag !

Cisco recommends to not enable fabric failover when using a hypervisor switch (btw. I am a 16 year Cisco veteran)

The reason why ?

High FI CPU Utilization: If you have 10 ESXi hosts multiplied times 100 VMs and FI A fails, FI B could potentially have to create and transmit 1000 GARPs. There is “CPU load” on the Fabric Interconnects whenever a GARP (gratuitous ARP) is created and transmitted. So, 1000 GARPs has a high impact on FI CPU
Longer VM failover times (potentially): It takes longer for the FI to transmit 1000 GARPs serially than it does for 10 vSwitches to transmit 100 GARPs in parallel. Since it takes longer for the FI to do this, restoration of connectivity for the 1000th VM could take MUCH longer if the FI does it vs. if its own vSwitch had done it.
With FF, bandwidth utilization tends to be more lop sided. Using vSwitch teaming means, statistically, more of the bandwidth is equally distributed across both fabrics.

We should not forget, that we need loadbalancing + failover, not only failover !

For a Best Practise I would refer to a up to date, excellent Cisco Live presentation.

Hypervisors networking: best practices for interconnecting with Cisco switches

BRKVIR-2019

Ramses Smeyers – Customer Support Engineer

If you can't find it, send me an email !

richbarb · ‎03-31-2014

Hey wdey, that's great you joining in the thread.

About the documentation that you mentioned, I didn't find nothing saying that's wrong use "Fabric Failover" feature. This document is only a brief overview about how the networking with hypervisors works.

By my real experience in field, I never had any problem with Fabric failover feature even in a enviroment with more than 1000 VM's running on it.

I think that's not wrong or right. You should have a good understanding how this feature works and take the decision to use or not.

Richard

Walter Dey · ‎03-31-2014

Hi Richard

I don't say FF is not working ! there was a time, when it was very useful indeed: with Windows 2008 R2, there was no Teaming solution from Microsoft; you had to go with a NIC vendors driver solution; this is now fixed with Windows 2012 and 2012 R2, MSFT has NIC Teaming embedded.

My recommendation: let the OS (and the OS admin) configure / manage / monitor server failover and loadbalancing; not the networking guys :-) Specially now that all OS support NIC Teaming and FC multipathing (FF only works for Ethernet; you also need redundancy for FC !)

I hope we agree about FF:

- The OS has no clue about UCS-based NIC failover
- Failover happens at the adapter/IOM level, well beyond the OS’ visibility
- Enabling UCS-based failover does not ‘grey out’ NICs inside the OS
- However, with vNIC Failover configured the OS is not notified that a link failure occurred. The OS then incorrectly thinks its 2 x 10GE NIC team is fully functional even though its capacity is halved.

Kind Regards

Walter.

Dragomir · ‎03-31-2014

actually for my VMs I am running nexus 1000v. I have 2 vnics and enable failover is disabled.

how does the 1000v handle the failover situation in case one fabric interconnect goes down or reboots?

Walter Dey · ‎03-31-2014

Sorry, there are numerous Cisco papers that advice the use (or better not use) of FF. see attachment

For ESX Server running vSwitch/DVS/Nexus 1000v on the Cisco UCS Manager Version 1.4 and beyond, it is recommended that, even with the Fabric Sync feature, the vNICs in pairs (two vNICs, one on each fabric) should be used for both fabrics.
The current recommendation is not to enable Fabric Failover for the vNICs given to the soft switch but instead to let the softswitch see vNIC go down and issue gARPs following this.

we are entering a rat hole ! I don't care what you dear customer finally do! My recommendation is clear !

The slides are valid single vswitch, VMware DVS, N1K configurations, or even a combinations of vswitch-DVS, vswitch-N1k,....

richbarb · ‎03-31-2014

I'm not always right, we came here just to try help people and learn at the same time. I also know a lot of UCS engineers that think like me on this, and we should be able to change of mind always that a new information comes up.

I tried to discuss with arguments, sorry if I leave you angry my dear.

Richard

Dragomir · ‎03-31-2014

if my vms are on nexus 1000v, 2 vnics uplinked to fabric A and fabric B.

If fabric A goes down, and the vm was going through fabric A, What will happen to the VM networking?

vnic enable failover is not checked in this case.

on my nexus 1000v the default is

port-channel load-balance ethernet source-mac

upgrading ucsm 2.1(1a) to 2.2(1c)