I am having a very strange yet interesting problem. I am hoping that maybe someone else experienced the issue or can offer some other insight into the issue. I have been opening tickets with both vendors and have been working on the issue for weeks at this point. I am hoping that maybe a set of eyes from the community can offer some additional insight.
I have two UCS 5108 blade chassis each with five Cisco UCS B200 M3 blades. On each server I have loaded Windows Server 2012 R2 Core and have installed Hyper-V. I have a separate server running System Center Virtual Machine Manager (SCVMM) 2012 R2 with the most recent rollup. On each server, I have presented 4 vNICs. One is for the DMZ and the other is for a temporary connection that is later removed. The other two connect up to a trunked port for all my VLANs, one on Fabric A and one on Fabric B. Using SCVMM, I create a virtual switch to do VLAN tagging on a Windows teamed NIC of both the vNIC on Fabric A and Frabic B. That switch then receives three virtual adapters, tagged for each of the VLANs.
The problem, unfortunately, is completely random. I can deploy the same Cisco templates and the same SCVMM virtual switch template to all ten servers. I am losing connection between servers over some or all VLANs. To make things even weirder, it isn't always all VLANs. So I have three VLANs and some servers have no communication issues while others don't communicate at all where others will communicate over VLAN 2 but not 3 and 4.
So far my troubleshooting steps included recommendations from both Cisco and Microsoft support.
We removed the teaming with the SCVMM virtual switch entirely and am only connecting the NICs on Fabric A. This has eliminate the theory that it was a pathing issue.
The issue spans across both chassis one and two, so it isn't central to a particular chassis.
If I create a vNIC that only has the native VLAN of 2, 3 or 4 and set the IP within Windows (removing SCVMM from the picture), everything works fine.
If we do a Wireshark, we can see that the server that "doesn't work" is not actually get its ARP requests out to the fabric interconnect.
Obviously, the last one is key as that is the root of the issue. Without the ARP, it won't communicate. However, the issue has been in attempting to figure out why the ARP request is not coming through. Has anyone come across a similar issue?
Introduction This article will help you understand the steps on how to
download the UCS licenses from the Cisco Systems website and then
installing it on the UCS. The redacted (blue lines) just covers up
certain numbers for privacy please do not take them...
Introduction This article will help you understand and educate the
customer on how to clear their "expired licenses"
(license-graceperiod-expired) from their UCS-M. If a customer just
purchased a license and needs a step by step guide on how to download
Introduction Prepositioning is a powerful tools on the WAAS platform but
it is not always easy to figure out why your jobs are failing when
trying to retrieve the files.Here is a method that should help you to
figure out the reason why they are not succes...