cancel
Showing results for 
Search instead for 
Did you mean: 
cancel

Who Me Too'd this topic

TCP segmentation offload (TSO) and vmxnet3/1000v - bug?

sdavids5670
Level 2
Level 2

NOTE: My knowledge of ESX and the 1000v does not run deep so I do not have a thorough understanding of the relationship / integration between those two components.  If anything I'm saying here is out-of-line I apologize in advance.

Yesterday a report came in that an IIS application on a staging server in our test environment was not working (in Internet Explorer it returned "Page cannot be displayed").  The IIS server sits behind an F5 load balancer.  Both the F5 and the IIS server are VM guests on a VMware ESX host.  Both the IIS server and the F5 had recently been moved to a new environment in which the version of 1000v changed and the vnic driver changed (from E1000 to vmxnet3) and this appeared to be the first time that anybody noticed an issue.

 

After some digging we noticed something peculiar.  The problem only manifested when the IIS server and the F5 were on the same physical host.  If they were not co-resident everything worked just fine.  After reviewing packet captures we figured out that the "Page cannot be displayed" was not because the content wasn't making it from IIS server to client but rather because the content was gzip compressed and something happened in-transit between the IIS server and the client to corrupt the payload thereby making the gzip decompressible.  As a matter of fact, at no time was IP connectivity ever an issue.  We could RDP to the IIS server and SSH/HTTP into the F5 without any issues.

 

I started digging in a little deeper with Wireshark (details of which are included as an attached PDF).  It turns out that a bug??? involving TCP segmentation offload (TSO) was causing the payload of the communication to become corrupted.  What I'm still trying to figure out is who is responsible for the bug?  Is it VMware or Cisco?  I'm leaning towards Cisco and the 1000v and this is why.

 

Referring to the attached PDF, TEST #2 (hosts co-resident) and TEST #3 (hosts not co-resident) show packet captures taken from the IIS box, the 1000v and the F5.  Figure 6 shows that the contents of the gzip'd payload can be deciphered by Wireshark as it leaves the IIS box.  Figure 8 shows capture data from the 1000v's perspective (spanning rx/tx on the F5 veth port).  It's still good at this point.  However, figure 10 shows capture data taken on the F5.  At some point in time between leaving the egress port on the 1000v and entering the F5 it cannot be decompressed (corrupt data).  There is no mention that the TCP checksum failed.  In my mind the only way that the data could be corrupt without a TCP checksum failure is if the corruption occurred during the segmentation of the packet.  However, if it was due to the guest OS-level vnic driver then why did it still look good to the 1000v egress towards the F5? 

The most curious aspect of this whole thing is the behavior I described earlier related to onbox vs. offbox.  This problem only occurs when the traffic is switched in memory.  Refer to figure's 11 - 16 for capture data that shows the very same test when the F5 and IIS are not co-resident.  Is the 1000v (or vnic) savy enough to skip TSO in software and allow the physical NIC to do TSO if it knows that the traffic is going to have to leave the host and go onto the physical wire?  That's the only way I can make sense of this difference in behavior.

   

In any case, here are all of the guest OS-level settings related to offload of any type (along with the defaults) and the one we had to change (in bold) to get this to work with the vmxnet3 NIC:

IPv4 Checksum Offload: Rx & Tx Enabled
IPv4 TSO Offload: From Enabled to Disabled
Large Send Offload V2 (IPv4): Enabled
Offload IP Options: Enabled
Offload TCP Options: Enabled
TCP Checksum Offload (IPv4): Rx & Tx Enabled
UPD Checksum Offload (IPv4): Rx & Tx Enabled

Who Me Too'd this topic