6509 reloaded 3 times in 3 mins - strange nwk behaviour

Unanswered Question
Dec 16th, 2009
User Badges:
  • Bronze, 100 points or more

Hi all,


Slightly perplexed by a network failure we experienced at work. We have 5 6509s with one as the Central switch (i.e. hub-and-spoke). They all have WS-X6K-SUP1A-2GE installed and for whatever reason there is only one VLAN across the whole LAN. The Central switch rebooted 3 times in 3 mins this morning showing NMI fault Watchdog timer causing outage to all 350 hosts. Unfortunately, we don't have TAC a/c on this and the Output Interpreter is shagged.


Just prior to the reboots 4 users had problems accessing the network off of their local switches even though the LAN is flat. For example, they couldn't ping their default gateway which hangs off of the Central switch, nslookup failed to resolve, however, ping resolved hostnames but pings failed. Connectivity to devices on their local switch was ok though. On some occasions I could see 50% packet loss on pings, first would fail, second would pass and that pattern would repeat. It looked to me as though there was an IP addressing issue although the support team weren't convinced and pointed to the NICs. I told them that 4 NICs was too much but we changed it on one PC and suddenly it worked. However, I still wasn't convinced so we changed the MAC address of the NIC on another PC which forced a DHCP release/renew and obtained a new IP address and it all started working suddenly. This left us with 2 PCs not working. I ran SPAN on one of the host ports (connected to PC 3) and sent a continuous ping from PC 3 to the default gateway. Looking at Wireshark I was amazed to see not only the echo requests coming from PC 3 but a number of other echo request/replies coming and going from other IP addresses. We then changed the MAC address of the NIC on PC 3 and then only saw the pings coming from the PC (now with a new IP address) and the replies received. As soon as we changed the MAC address of the NIC back to the old one the old symptoms raised their ugly heads. This to me points to a DHCP issue. Could it be that DHCP has assigned multiple IP address to the same MAC address?


I'm not at site until morning so was wondering whether you guys have any suggestions in the interim that may be the cause of this issue?


Many thanks

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Leo Laohoo Wed, 12/16/2009 - 13:33
User Badges:
  • Super Gold, 25000 points or more
  • Hall of Fame,

    The Hall of Fame designation is a lifetime achievement award based on significant overall achievements in the community. 

  • Cisco Designated VIP,

    2017 LAN, Wireless

Could you please provide more information?


1.  Do you have a crashinfo file?  If you do, please post it.
2.  Can you do a "sh tech" and post the output?
3.  What IOS & feature set do you have?
4.  Can you provide the "sh module" output please?

LordFlasheart Wed, 12/16/2009 - 13:38
User Badges:
  • Bronze, 100 points or more

Hi,


As I said I'm at home now so don't have anything to hand but can get a show tech to you tomorrow morning. Central Switch is running CatOS SW Version 5.5 and the others are running 6.1.


Cheers


Chris

LordFlasheart Thu, 12/17/2009 - 06:07
User Badges:
  • Bronze, 100 points or more

Right then, I've done some further analysis. The diagram shows you the basic connections. Everything is in VLAN 1 with no trunk ports configured (not my design I hasten to add).


Z4Switch(9/16)-->(4/8)Central Switch(4/4)-->(9/1)Z5Switch(3/31)-->PC1


Keeping PC1 as is with the default MAC address, Z5Switch learns the MAC address on 3/31 as expected. Z4Switch learns it on 9/16 as expected, however, there is no record of the MAC address on the Central Switch at all. It's as if it doesn't learn the MAC address or something is stopping it from learning that particular MAC address. If I change the MAC address of PC1's NIC then the new MAC addres is also learned by the Central Switch on 4/4 as expected and end to end connectivity is there. If I set cam dynamic aa:bb:cc:dd:ee:ff 4/4 on the Central Switch the switch falls over.


I've attached the show tech for your perusal. If anyone can work out what may be happening here I would appreciate it.

Leo Laohoo Thu, 12/17/2009 - 15:17
User Badges:
  • Super Gold, 25000 points or more
  • Hall of Fame,

    The Hall of Fame designation is a lifetime achievement award based on significant overall achievements in the community. 

  • Cisco Designated VIP,

    2017 LAN, Wireless

I don't see anything odd with this except for an uptime of "0,00:18:19".


You can't find any crashinfo?  Can you post the "sh version" output please?

LordFlasheart Mon, 12/21/2009 - 04:07
User Badges:
  • Bronze, 100 points or more

Hi,


Sorry about the late reply. We reseated the supervisor engine the other morning but haven't been able to conduct further tests due to the location being shut because of the snow. I've attached a show version for you but am unable to find a crashinfo file. I've looked at bootflash and there is nothing there other than the image. I've looked at sup-bootflash but that doesn't exist.


Regards

Actions

This Discussion