I have a strange intermittent problem. Our setup is as follows:
(2) 6513s w/Sup2 running 12.2(18)SXD7a
Many HP servers (DL380s, 385s, 785s) running Network Fault Tolerance (NFT) NIC teams in Active/Standby mode. One NIC attached to each switch.
Every once in awhile we have a server fall off the network. It can be months of normal function before it happens to the same server twice. It disappears; can't ping, see the mac-address-table entries, nothing. The fix is to disable the team, or switch the VLAN port assignment and switch it back. Tables repopulate and the switch starts forwarding again.
We have tried every combination of server-side NIC teaming configuration. HP support claims there is nothing wrong with our setup. TAC cannot point to an issue with the switch configuration or known bug with the code version. (though I know it's getting old) I have a Core network rebuild project coming up in a couple months which will forklift these switches, but my concern is that I just transport the problem to new hardware and/or software.
Here is a sample switchport configuration:
no ip address
switchport access vlan 70
switchport mode access
There are no errors on the port.
One thing of interest is that this only occurs to Windows boxes. The NFT setup puts a different MAC address on each port, but only one forwards. I have verified this by packet captures and it is just multicast heartbeat packets. However, the switchport that should be passive spews a lot TX packets to the NIC that is not listening. At first I thought this was just flooded traffic for non-tabled destinations, but the amount seems too high. It is on the order of 2 - 3 MBs and includes a wide variety of source and destination IPs. Our Xnix boxes do this differently, and use only a single MAC address that appears on only one port at time, obviously. No problems.
Any recommendations or ideas are appreciated. Thanks!