6509 reloaded 3 times in 3 mins - strange nwk behaviour
Slightly perplexed by a network failure we experienced at work. We have 5 6509s with one as the Central switch (i.e. hub-and-spoke). They all have WS-X6K-SUP1A-2GE installed and for whatever reason there is only one VLAN across the whole LAN. The Central switch rebooted 3 times in 3 mins this morning showing NMI fault Watchdog timer causing outage to all 350 hosts. Unfortunately, we don't have TAC a/c on this and the Output Interpreter is shagged.
Just prior to the reboots 4 users had problems accessing the network off of their local switches even though the LAN is flat. For example, they couldn't ping their default gateway which hangs off of the Central switch, nslookup failed to resolve, however, ping resolved hostnames but pings failed. Connectivity to devices on their local switch was ok though. On some occasions I could see 50% packet loss on pings, first would fail, second would pass and that pattern would repeat. It looked to me as though there was an IP addressing issue although the support team weren't convinced and pointed to the NICs. I told them that 4 NICs was too much but we changed it on one PC and suddenly it worked. However, I still wasn't convinced so we changed the MAC address of the NIC on another PC which forced a DHCP release/renew and obtained a new IP address and it all started working suddenly. This left us with 2 PCs not working. I ran SPAN on one of the host ports (connected to PC 3) and sent a continuous ping from PC 3 to the default gateway. Looking at Wireshark I was amazed to see not only the echo requests coming from PC 3 but a number of other echo request/replies coming and going from other IP addresses. We then changed the MAC address of the NIC on PC 3 and then only saw the pings coming from the PC (now with a new IP address) and the replies received. As soon as we changed the MAC address of the NIC back to the old one the old symptoms raised their ugly heads. This to me points to a DHCP issue. Could it be that DHCP has assigned multiple IP address to the same MAC address?
I'm not at site until morning so was wondering whether you guys have any suggestions in the interim that may be the cause of this issue?
Keeping PC1 as is with the default MAC address, Z5Switch learns the MAC address on 3/31 as expected. Z4Switch learns it on 9/16 as expected, however, there is no record of the MAC address on the Central Switch at all. It's as if it doesn't learn the MAC address or something is stopping it from learning that particular MAC address. If I change the MAC address of PC1's NIC then the new MAC addres is also learned by the Central Switch on 4/4 as expected and end to end connectivity is there. If I set cam dynamic aa:bb:cc:dd:ee:ff 4/4 on the Central Switch the switch falls over.
I've attached the show tech for your perusal. If anyone can work out what may be happening here I would appreciate it.
Re: 6509 reloaded 3 times in 3 mins - strange nwk behaviour
Sorry about the late reply. We reseated the supervisor engine the other morning but haven't been able to conduct further tests due to the location being shut because of the snow. I've attached a show version for you but am unable to find a crashinfo file. I've looked at bootflash and there is nothing there other than the image. I've looked at sup-bootflash but that doesn't exist.
We are pleased to announce availability of Beta software for 16.6.3. 16.6.3 will be the second rebuild on the 16.6 release train targeted towards Catalyst 9500/9400/9300/3850/3650 switching platforms. We are looking for early feedback from custome...