Slightly perplexed by a network failure we experienced at work. We have 5 6509s with one as the Central switch (i.e. hub-and-spoke). They all have WS-X6K-SUP1A-2GE installed and for whatever reason there is only one VLAN across the whole LAN. The Central switch rebooted 3 times in 3 mins this morning showing NMI fault Watchdog timer causing outage to all 350 hosts. Unfortunately, we don't have TAC a/c on this and the Output Interpreter is shagged.
Just prior to the reboots 4 users had problems accessing the network off of their local switches even though the LAN is flat. For example, they couldn't ping their default gateway which hangs off of the Central switch, nslookup failed to resolve, however, ping resolved hostnames but pings failed. Connectivity to devices on their local switch was ok though. On some occasions I could see 50% packet loss on pings, first would fail, second would pass and that pattern would repeat. It looked to me as though there was an IP addressing issue although the support team weren't convinced and pointed to the NICs. I told them that 4 NICs was too much but we changed it on one PC and suddenly it worked. However, I still wasn't convinced so we changed the MAC address of the NIC on another PC which forced a DHCP release/renew and obtained a new IP address and it all started working suddenly. This left us with 2 PCs not working. I ran SPAN on one of the host ports (connected to PC 3) and sent a continuous ping from PC 3 to the default gateway. Looking at Wireshark I was amazed to see not only the echo requests coming from PC 3 but a number of other echo request/replies coming and going from other IP addresses. We then changed the MAC address of the NIC on PC 3 and then only saw the pings coming from the PC (now with a new IP address) and the replies received. As soon as we changed the MAC address of the NIC back to the old one the old symptoms raised their ugly heads. This to me points to a DHCP issue. Could it be that DHCP has assigned multiple IP address to the same MAC address?
I'm not at site until morning so was wondering whether you guys have any suggestions in the interim that may be the cause of this issue?