VMware ESX and dual homing

Unanswered Question
Jan 29th, 2008

We are beginning to deploy VMware ESX servers for Windows production environments. How do I set up the VNICs to dual home the VM's to separate Catalyst 6500's.

Thanks, Lisa

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 4 (1 ratings)
Loading.
chrobrie Tue, 01/29/2008 - 14:46

Hi Lisa,

VNICs actually connect to the virtual switch (vswitch) which is a software construct of the VM kernel. In the world of ESX, VMNICs are the physical server NICs that will be dual-homed to your Catalyst access layer switches. Typically, you will create 802.1q trunk on the Catalyst ports and prune all unnecessary VLANs.

There is some configuration on the vswitch that must be addressed as well. Firstly, a vswitch must be created that references the appropriate VMNICs (physical NICs). In addition, the vswitch should be configured with a Port Group, a Port Group defines multiple network settings such as QoS, VLANs, and security. A Port Group is a Network Label. The administrator can associate the VM with a Port Group which in turn applies the previously mentioned settings.

I have a paper about ESX 2.5 network implementations at this link which describes in further detail. Despite the earlier version much the general networking best practices still apply. http://www.cisco.com/application/pdf/en/us/guest/netsol/ns304/c649/ccmigration_09186a00807a15d0.pdf

Take care, Chris

stephen.love Fri, 03/28/2008 - 01:34

Hi Lisa

Just thought that I would add to this post! I am doing a lot of work in this area - particulary with dual homing / NIC teeming etc. When we are selling a VMWare ESX environment we tend to install more physical NIC's into the ESX server for aggregation and load balancing purposes, this is also a good way of segregating VLAN traffic. Something that I have been recommending to clients is using a Server access layer between the physical servers and the Core switches - this is typically a reselient pair of 3750's. The reason for this recommendation is to utilise the 32GB stacking cabability of the 3750. this enables the use of etherchannel across the switches - as the stack operates as a single entity.

Steve

mslavin Sun, 03/30/2008 - 17:09

Hi Lisa,

There is some great information in the threads. I would like to add to this as well :-). VMware has a an excellent presentation you can watch that talks about the different ways NICs can be utilized (can download the preso and audio separately from here as well):

http://www.vmworld.com/vmworld/mylearn?classID=11647

This requires a login ID that you can get for free from this link as well.

A comment on if you want to use aggregation (AKA EtherChannel) on the NICs. You can only use aggregation of NICs if both NICs either go to the same logical switch (i.e. 3750's stacked together or 6500's in a VSS pair) or a single physical switch. To use aggregate, you will need to configure static aggregation on the Cisco switch side (for example, if IOS - run the command "channel-group X mode on" (X = some number for the group)) toward the VMware ESX server, and on the server itself, configure the team to use “Route based on IP hash“.

A comment on if you want to use 802.1Q trunking: Most Cisco switches have the concept of a Native VLAN (a single untagged VLAN) on an 802.1Q trunk, while VMware does not. Owing to this difference in operation, you will need to account for this in your configurations if you want to use 802.1Q. The following are ways to address this:

1) Do not use the Native VLAN on the Cisco side for data on the ESX server. For example, if the Native VLAN on the Cisco side is 1, do not use VLAN 1 in ESX

2) Set the Native VLAN to unused VLAN on Cisco side. i.e. "switchport trunk native vlan x" (x = unused VLAN) and do not use this in ESX

3) Tag the Native VLAN (global command so need to make sure ALL trunked interfaces support and are configured for this on both sides). i.e. "vlan dot1q tag native" (note that not all Cisco switches support a tagged Native VLAN).

Hope this helps.

Thanks, Matt

scostigan2 Thu, 04/03/2008 - 22:34

Matt,

I have a similar issue that Lisa has, I was hoping to use the VSS of own brand new 65K's with the FWSM modules to provide cross switch EC's. Do you know when this will be avaible, I have not been able to get a straight answer?

In the meantime do you know if the VmWare nics on the quad mezz cards can be EC across the two PCI slots that it presents. The quad card is actually two dual with diffrent PCI slots. I have always had issues with spanning multiple nics for ECs.

The problem is based on the fact that the client was recommened to get the 3020's for greater uplinks speeds but the issue is of course that the nic's go to different bay's i.e. different switches. I have some Passthru's also and trying to come up with the ultimate throughput for the connections.

I know this is a long question but this is the first place I have seen that anybody is trying to do similiar things.

I can provide more detailed info if needed.

Thanks,

Steve

mslavin Fri, 04/04/2008 - 06:42

Hi Steve,

Specific to VSS by itself, that is available and shipping today, but support for the FWSM with VSS is not yet available (as you have already noted :-). I am currently hearing Q3CY08 as a possible timeframe for supporting VSS and the FWSM, but that is not written in stone. In the mean time, you could still take advantage of VSS to do the multi-chassis EtherChannel, just not with the FWSM included.

Specific to the question on the quad mezz cards, I personally do not have any experience with this specific card, but do know that teaming/bonding software is getting better every day, but we all know that not everything works as advertised, so in that case (and actually, in every case if you think about it), any such design should be fully tested before going into production, to make sure it works as expected/desired.

In your post you mention the 3020 (HP Cisco blade switch). That does indeed throw a bit of a wrench in the works, since as you noted, the NICs will each go to separate physical switches in the enclosure, thus making EtherChannel type solutions on the server impossible. In that case, I normally recommend a simple Active/Standby form of teaming/bonding, as it is robust and deterministic (proprietary forms of Active/Active, in my experience, are neither). If you did decide to go with pass-thru (instead of 3020) to a VSS environment, you could then take advantage of the EtherChannel type teaming, but then you introduce the headache of all of those cables from the pass-thru's, which defeats one of the more common purposes many people go to blades, reduced cabling.

Another solution that would give you the best of both worlds in a blade enclosure (reduced cabling and EtherChannel teaming on the servers), is to look at the new 3120's just coming out. With their stacking ability, multiple switches look and act as a single logical switch (exactly like the 3750E), so when these are deployed in the enclosure and stacked, you can indeed use EtherChannel on the server NICs while still getting cable reduction for the enclosure.

HTH, Matt

melnik-r Mon, 07/28/2008 - 04:40

We are seeing a similar issue on a pair of 6509s in VSS mode at our data center, but we are experiencing it in both active/active LACP and active/standby. This happens on our VNWare ESX on HP servers. When connecting over the WAN attached to Chassis one which has NIC one it works fine. When coming into chassis two where NIC two is attached, it fails. When sniffed, the server sends a reset. One of our servers works fine in active/standby, but there are four that will not. Has anyone heard of a work-around or cause for something like this? Redundancy is critical.

Thanks in advance.

mslavin Mon, 07/28/2008 - 04:56

Hi Raymond,

A couple of questions:

1) Are you using HP C class blade enclosures or are these stand alone servers?

2) If this is a blade enclosure, what are you using in the switch bays (pass-thru's, 3120's, 3020's, something else)?

3) What hash type are you using in VMware and tell us what the NIC association in the Vswitch(s) looks like?

4) Can you share the port configuration of the VSS switches going to the servers in question?

Thanks, Matt

melnik-r Mon, 07/28/2008 - 05:06

Hi Matt

Thanks so much for responding. I'm in infrastructure and the server folks told me I was mistaken. I thought it was VMWare when it is an issue with Windows 2003 64-bit physical servers running on HP DL585G2s. I guess I'm in the wrong session, but if you have any hints it would be great. In Active/Standby it was just a standard LACP port-channel, access and in active/standby they were standard access ports. The VLANs are single and hard set.

mslavin Mon, 07/28/2008 - 05:17

No problem

You mention Active/Standby on the server, connecting to an LACP EtherChannel on the VSS, and this would be broken, as these are two different types of teaming. Both sides (the server and the upstream 6500's) need to agree on how they will use the redundant links. If you want to use LACP active/active, the team on the server needs to be set up for "802.3ad Dynamic" (and your VSS needs to be configured for "channel group X mode active".

Can you get the server team to share with you exactly what team type they are using in Windows?

Thanks, Matt

melnik-r Mon, 07/28/2008 - 05:21

Thanks Matt. They can work with me in 40 minutes so I can find out. The strange thing is that one of four works and supposedly nothing is different.

Thanks again and I will post the info as soon as available.

melnik-r Mon, 07/28/2008 - 05:29

I was able to look and it is 802.3ad Dynamic. My end is correct as well. I'm wondering if there is a specific caveat with 64 bit Windows 2003 server.

mslavin Mon, 07/28/2008 - 05:34

There may be, but I would do some more checking first.

Question: Can you post the output of a "show eth sum" for the upstream port channel connecting to this server?

melnik-r Mon, 07/28/2008 - 05:37

Hi Matt

They had me remove it while they test something else. I will post as soon as I reconfigure it.

Thanks again

melnik-r Mon, 07/28/2008 - 05:53

Hi Matt

The server tech said that when he deleted and re-created the active/standby mode, the issue went away. But there are still excessive retransmits to look at. They decided not to do LACP active/active so they are just set to access.

mslavin Mon, 07/28/2008 - 05:58

As a possible troubleshooting tip, I recommend they remove teaming and place the team IP address on each NIC (one at a time, while shutting down the other NIC), and see if they see a specific issue with a given NIC. This will help to show if there is a particular issue with one specific NIC's connection.

Actions

This Discussion