Solved: Dual 6500 and UCS

a.brazendale · ‎10-01-2010

Hi all,

We have recently installed a UCS enviornment with 2 x 6100 units connected to our dual 6509 core. It is set up so that 2 x 10GigE uplinks connect from FAB-A to 6509-1 and 2 x 10GigE connect from FAB-B to 6509-2. Each of the 4 trunks to the 6509 enviorment are configured the same and truck the same 4 VLAN to FAB-A or FAB-B. We have 2 x UCS Chassis, each with 4 half-width servers. Each chassis has an A and B side, with 20GB going to FAB-A and 20GB going FAB-B.

The above UCS enviorment was installed by a 3rd party, while I configured the core-side of the links.

Our server team recently brought up a Windows server and I started seeing mac-flapping on one of our cores, between one of the 10GigE ports connected to FAB-A and the trunk link that exists between our 2 cores.

I'm a bit unsure about how, on the 6509 side, the ports should be configured.. I was told by the 3rd party that there is no 'active/standby' design, all of the trunks to each of the two 6100s are active 'all the time'. Our 6509 aren't running the VSS supervisors, just 'standard' sup720s running HSRP, with a trunk between the two.

I understand that more details is probably required, but I said i'd look into from a networking/core standpoint and the server guy is checking out his end, as as it stands he has the server in question connected to 2 vNICs, one on each of the 6100s, both 'up'.

Manish Tandon · ‎10-02-2010

Alan/Jason

Looking at the thread, it mentions of a vswitch.

For hypervisor based environments, (currently) the recommendation is to have Fabric Failover OFF and Brad's blog indicates that.

So turning on Fabric failover will not do you any good and will have unpredictable failover when failure occurs.

A detailed topology (or a TAC case) would be useful to narrow down on the issue.

A lot depends on how your vswitch is configured as before the traffic hits either of the FI's, the vswitch has to make a forwarding decision which side to go out - on A or B (give you have 2 vNICs given to a vswitch which I believe you would have). Once the traffic comes to a particular FI, then again it depends on which active uplink it will take which could dictate if the trunk between the cores get used etc.

Hope that helps.

Thanks

--Manish

View solution in original post

Jason Masker · ‎10-01-2010

You need to incorporate your 6500 dual core design into the UCS configuration and it sounds like this was not done.

You have many options for uplinking the UCS. The best guide I have seen for this is here:

http://bradhedlund.com/2010/06/22/cisco-ucs-networking-best-practices/

The configuration that most likely pertains to your scenario is covered in this video:

http://vimeo.com/12782045

You can cross connect both fabrics to both cores or you can have them set up one to one as you have. That is the design I chose. We also use fabric failover for the Palo vNIC adapters for the servers and have the vNICs with VLANs primary on one core, set to primary on the fabric attached to that core but with the ability to fail over, if necessary.

a.brazendale · ‎10-01-2010

'Fabric Failover' ; I think this is what our guys need to enable. I've seen on the UCS side that allows you to select whether this vNIC is pinned/attached to Fabric A or Fabric B (via radio button) and then you can tick the option for 'Fabric Failover'. Currently there are two vNICs per vswitch and one is attached to Fabric B and one to Fabric A, neither of which is enabled for failover. They could just use one and enable it for failover, which I assume would then pass the traffic through to the selected fabric until such time that there is a failure and it would then failover to the other fabric..

We currently have the vlan that the server is on hosted by one of the cores (for the vlan interface) and this is the primary one, but I think it's this second vNIC that's causing the issues.. as it was set to provide redundacy.

Thanks for the links..

-Alan

Jason Masker · ‎10-01-2010

Correct, Alan. With fabric failover, you can accomplish redundancy with only one vNIC and the server OS does not need to know about or manage anything to do with the failover. Brad has another post that goes into some detail on that.

http://bradhedlund.com/2010/09/23/cisco-ucs-fabric-failover/ With UCS in your environment, you will find his is a very good blog to follow.

You should make the vNIC primary on the fabric that is connected to the core which is the root bridge and has the primary HSRP or VRRP router interface of the VLAN the server is on and enable fabric failover. If the fabric fails and it goes out the other fabric to your other core, it will still be forwarded, but will have to cross the trunk between your cores to be routed. Making sure that the primary fabric for any given vNIC matches the primary router & root bridge for the access VLAN assigned to it will minimize the traffic crossing the trunk between your cores.

There are other redundancy options beside fabric failover which can work better with VSS or in situations where you have equipment capable of virtual port channels. Likely your server team had the server configured to spread its traffic between two interfaces, one on each fabric, for load balancing and redundancy. This is not the best scenario for uplinking to your dual non-VSS 6500s.

Manish Tandon · ‎10-02-2010

Alan/Jason

Looking at the thread, it mentions of a vswitch.

For hypervisor based environments, (currently) the recommendation is to have Fabric Failover OFF and Brad's blog indicates that.

So turning on Fabric failover will not do you any good and will have unpredictable failover when failure occurs.

A detailed topology (or a TAC case) would be useful to narrow down on the issue.

A lot depends on how your vswitch is configured as before the traffic hits either of the FI's, the vswitch has to make a forwarding decision which side to go out - on A or B (give you have 2 vNICs given to a vswitch which I believe you would have). Once the traffic comes to a particular FI, then again it depends on which active uplink it will take which could dictate if the trunk between the cores get used etc.

Hope that helps.

Thanks

--Manish

Jason Masker · ‎10-02-2010

Manish is correct. Sorry, I missed the reference to VMware in Alan's second post.

Manish Tandon · ‎10-02-2010

By default vSwitch does "Route based on originating Virtual Port id" which does active/active usage of links based on VM's i.e it pins a VM to either of the uplinks. Its analogous to the vPC host mode in Nexus.

It can be overridden with config at vSwitch or the port group level to other things like active/passive etc. Only route based on IP hash will not work in UCS (as the FI's are not vPC peers).

The other options all work and most of the setups on UCS I have seen have the port id hash(the default) i.e no special config out of the box.

http://communities.vmware.com/message/1177404

We would need more information abt the setup and behavior seen (a single MAC bouncing or many for example) etc to make the call on what the issue is here.

Thanks

--Manish

a.brazendale · ‎10-03-2010

Thanks for all the replies..

I'll check but i'm pretty sure the guys set it to 'Route based on IP Hash'.. which doesn't seem to be the way to go, as we don't use vPC or anything like that.

cheers,

Alan