cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3117
Views
0
Helpful
14
Replies

VM Causes Output Errors

tommcnicholas
Level 1
Level 1

I'm looking for some out of the box thinking here.

 

Here's the setup:

vSphere 5.1 Cluster

4 x B200M2 in Chassis A

4 x B200M2 in Chassis B

Chassis A has 2104XP IOM and does not use port-channeled connectivity.

Chassis B has 2208XP IOM and DOES use port-channels.

Windows 2008R2 virtual machine, hardware version 8, VMXNET3 network interface.

 

Here's the problem:

If the VM is running on an ESXi host in Chassis A, the physical interface that server is pinned to will start clocking output errors slowly. If you move it to another host in the same chassis, the physical link for that host will start clocking errors. If you move it to a host in the other chassis, the port-channel starts clocking output errors.

14 Replies 14

Walter Dey
VIP Alumni
VIP Alumni

Which UCS version. Which adaptor in the B200-M2 ?

Did you check that enic/fnic and W2008 R2 drivers comply with the UCS version according to the UCS Interop Matrix ?

Fabric Interconnect @ 2.1(3b), though problem existed when FI version matched blades, we're in an interim upgrade phase.

Blade CIMC/BIOS/Firmware @ 2.1(1d), eNIC 1.5.0.20, fNIC 2.1.2.38

after host updates to 5.1U2 and firmwares 2.1(3b), eNIC will be updated to 1.5.0.45, fNIC will not.

 

vSphere 5.1GA/Patch3 on 2.1(1d), eNIC 1.5.0.20, fNIC 2.1.2.38 matches the support matrix.

Thanks ? which I/O adaptor ?

Can you please post the error message / counter that you see on the interface ?

The blades have M81KRs, I should probably see if a VIC1240 does it as well but I don't have that class blade spare right now.

Here's what we see from NXOS:

 

Ethernet1/1 is up
  Hardware: 1000/10000 Ethernet, address: 0005.73fb.5a88 (bia 0005.73fb.5a88)
  Description: S: Server
  MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec,
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA
  Port mode is fex-fabric
  full-duplex, 10 Gb/s, media type is 10G
  Beacon is turned off
  Input flow-control is off, output flow-control is off
  Rate mode is dedicated
  Switchport monitor is off
  EtherType is 0x8100
  Last link flapped 8week(s) 0day(s)
  Last clearing of "show interface" counters never
  30 seconds input rate 7072336 bits/sec, 884042 bytes/sec, 1683 packets/sec
  30 seconds output rate 40788736 bits/sec, 5098592 bytes/sec, 4534 packets/sec
  Load-Interval #2: 5 minute (300 seconds)
    input rate 7.07 Mbps, 1.32 Kpps; output rate 10.95 Mbps, 1.93 Kpps
  RX
    13017205996 unicast packets  7010323 multicast packets  5459427 broadcast packets
    13029675746 input packets  11016470049276 bytes
    4321622131 jumbo packets  0 storm suppression packets
0 giants      0 input error  0 short frame  0 overrun   0 underrun      0 watchdog  0 if down drop
    0 input with dribble  0 input discard
    0 Rx pause
  TX
    15992467619 unicast packets  257608563 multicast packets  455953425 broadcast packets
    16706140077 output packets  14865775850059 bytes
    6553509485 jumbo packets
    110470 output errors  0 collision  0 deferred  0 late collision
    0 lost carrier  0 no carrier  0 babble
    0 Tx pause
  2 interface resets

I see jumbo frames and multicasts !

Which applications use JF resp. MC ?

Is JF properly configured on the UCS and/or vswitch / DVS / N1K

eg. did you check ping from ESX CLI with jumbo frames and -DF flag ?

are you using vswitch, DVS or N1k
 

We're using VMware VDS, set to 5.1 version.

We're not using jumbo frames intentionally anywhere, the QoS system class and vNICs are at the UCS defaults. Some of our VLANs have IGMP querier turned on and application inside them that do some basic clustering over multicast. The jumbo frames recorded are probably from the first hop FCoE traffic I would think? We do FC connectivity straight to MDS not through Nexus.

ok, thanks for the clarifications.

If I understand you correctly, the output errors show up between IOM and FI ? Correct ?

Are this errors seen on both fabrics A and B ?

What kind of load balancing is setup on the DVS ?

Do you see errors on the Northbound uplink from FI as well ?

After all; are this output errors cosmetic, or are you having performance problems ? and if yes ? IP and/or FC ?

Yes, errors show up between the FI and IOM.

Errors can be seen on either A side or B side.

VDS is set to route based on source port by default. I have also created duplicate port-groups as a test, that have difference active uplinks assigned. Example:

PG_ONE (Tag 1130) - both nics set as "active", the "normal" port group.

PG_ONE_A (Tag 1130) - vmnic0 Active, vmnic 1 Passive

PG_ONE_B (Tag 1130) - vmnic0 Passive, vmnic 1 Active

If I change the VM's port-group assignment from PG_ONE_A to PG_ONE_B, you will see the errors move from the A side to the B side of UCS. The errors never show up on the LAN or SAN uplinks.

This is all cosmetic, but unfortunately the transmit errors are being picked up by Solarwinds SNMP monitoring of the fabric interconnect  and therefore are tripping our thresholds for errors. Ideally this threshold catches "receive" error which is our key indicator for a bad cable between the FI and IOM.

Do you see any errors on vCenter and/or ESXi ?

esxcli network nic stats get -n vmnic0

-----------------------------------------------

Let me summarize: problem shows up

- on fabric A and/or B

- 2104 and 2208

- multiple blades in 2 chassis

- seen only between IOM and FI

- not seen Northbound of FI

Summary is accurate, I grabbed some stats off one of the hosts. 

For reference:

vmnic0 and 1 are for Management only

vmnic2 and 3 are for VDS

vmnic5 and 6 are vMotion

 

~ # esxcli network nic stats get -n vmnic0
NIC statistics for vmnic0
   Packets received: 183853406
   Packets sent: 14248
   Bytes received: 15109232248
   Bytes sent: 10860364
   Receive packets dropped: 0
   Transmit packets dropped: 0
   Total receive errors: 0
   Receive length errors: 0
   Receive over errors: 0
   Receive CRC errors: 0
   Receive frame errors: 0
   Receive FIFO errors: 0
   Receive missed errors: 0
   Total transmit errors: 0
   Transmit aborted errors: 0
   Transmit carrier errors: 0
   Transmit FIFO errors: 0
   Transmit heartbeat errors: 0
   Transmit window errors: 0
~ # esxcli network nic stats get -n vmnic1
NIC statistics for vmnic1
   Packets received: 216684173
   Packets sent: 78566218
   Bytes received: 47589164213
   Bytes sent: 83540882074
   Receive packets dropped: 27
   Transmit packets dropped: 0
   Total receive errors: 17
   Receive length errors: 0
   Receive over errors: 0
   Receive CRC errors: 17
   Receive frame errors: 0
   Receive FIFO errors: 0
   Receive missed errors: 0
   Total transmit errors: 0
   Transmit aborted errors: 0
   Transmit carrier errors: 0
   Transmit FIFO errors: 0
   Transmit heartbeat errors: 0
   Transmit window errors: 0
~ # esxcli network nic stats get -n vmnic2
NIC statistics for vmnic2
   Packets received: 2308445804
   Packets sent: 1769198467
   Bytes received: 760268005329
   Bytes sent: 1007781294817
   Receive packets dropped: 5801
   Transmit packets dropped: 0
   Total receive errors: 4
   Receive length errors: 0
   Receive over errors: 0
   Receive CRC errors: 4
   Receive frame errors: 0
   Receive FIFO errors: 0
   Receive missed errors: 0
   Total transmit errors: 0
   Transmit aborted errors: 0
   Transmit carrier errors: 0
   Transmit FIFO errors: 0
   Transmit heartbeat errors: 0
   Transmit window errors: 0
~ # esxcli network nic stats get -n vmnic3
NIC statistics for vmnic3
   Packets received: 3515742297
   Packets sent: 1260747637
   Bytes received: 795264860939
   Bytes sent: 382208295578
   Receive packets dropped: 0
   Transmit packets dropped: 0
   Total receive errors: 179
   Receive length errors: 0
   Receive over errors: 179
   Receive CRC errors: 0
   Receive frame errors: 0
   Receive FIFO errors: 0
   Receive missed errors: 0
   Total transmit errors: 0
   Transmit aborted errors: 0
   Transmit carrier errors: 0
   Transmit FIFO errors: 0
   Transmit heartbeat errors: 0
   Transmit window errors: 0
~ # esxcli network nic stats get -n vmnic4
NIC statistics for vmnic4
   Packets received: 4822791
   Packets sent: 13
   Bytes received: 660278761
   Bytes sent: 832
   Receive packets dropped: 0
   Transmit packets dropped: 0
   Total receive errors: 0
   Receive length errors: 0
   Receive over errors: 0
   Receive CRC errors: 0
   Receive frame errors: 0
   Receive FIFO errors: 0
   Receive missed errors: 0
   Total transmit errors: 0
   Transmit aborted errors: 0
   Transmit carrier errors: 0
   Transmit FIFO errors: 0
   Transmit heartbeat errors: 0
   Transmit window errors: 0

~ # esxcli network nic stats get -n vmnic5
NIC statistics for vmnic5
   Packets received: 129866853
   Packets sent: 43742076
   Bytes received: 186885321654
   Bytes sent: 51537911982
   Receive packets dropped: 53669
   Transmit packets dropped: 0
   Total receive errors: 8
   Receive length errors: 0
   Receive over errors: 0
   Receive CRC errors: 8
   Receive frame errors: 0
   Receive FIFO errors: 0
   Receive missed errors: 0
   Total transmit errors: 0
   Transmit aborted errors: 0
   Transmit carrier errors: 0
   Transmit FIFO errors: 0
   Transmit heartbeat errors: 0
   Transmit window errors: 0

Kinda fascinating, I forgot for a second I had just vMotioned the VM to another host, going back the one its been running on for some time, we see lots more receive over errors.

 

~ # esxcli network nic stats get -n vmnic2
NIC statistics for vmnic2
   Packets received: 6443929963
   Packets sent: 5603998409
   Bytes received: 1958660312013
   Bytes sent: 2551197563270
   Receive packets dropped: 3560
   Transmit packets dropped: 0
   Total receive errors: 178192
   Receive length errors: 0
   Receive over errors: 178260
   Receive CRC errors: 0
   Receive frame errors: 0
   Receive FIFO errors: 0
   Receive missed errors: 0
   Total transmit errors: 0
   Transmit aborted errors: 0
   Transmit carrier errors: 0
   Transmit FIFO errors: 0
   Transmit heartbeat errors: 0
   Transmit window errors: 0

 

I'm actually seeing the behavior reproduced for two domain controllers for this same Active Directory which reside at another datacenter, with similar blades and configurations (including mixed FIC/blade firmwares). Since that vSphere cluster is two nodes and is only running these two VMs, I'm updating one of the two hosts to 5.1U2 and UCS 2.1(3b) with newer fNIC/eNIC drivers and seeing where that gets me.

This might be related to DVS and or ESXi, see eg.

The output of esxtop shows dropped receive packets at the virtual switch (1010071)
kb.vmware.com/kb/1010071

vCenter Server 5.1 and 5.5 performance charts report dropped network packets (2052917)
http://kb.vmware.com/kb/2052917

http://kb.vmware.com/kb/205291

 

tommcnicholas
Level 1
Level 1

Well... firmware update to 2.1(3b) and ESXi5.1U2+patches did not resolve the issue. :|

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card