All Switches hanged: need urgent help

Unanswered Question
Jun 29th, 2010
User Badges:

Dear All,


We have around 16 switches(2 core (3560 and 14 * 2900 access switches)  in network and all got hang for more then 20 mins.. we tried to connect through console of all switches but failed.. except 4 switches...

I checked the log of core switches and found conflict of HSRP ip for one VLAN.. there are around 20 L3 vlans interface/HSRP configured.

If it is due to HSRP conflict, will this cause all siwtches hang(hanged for more then 20 mins and came up after restart) except 4 access switches????


Plz let me know what all could be the possibilities...


I am also suspecting that there might be earting(power issue) which could cause switch in hang state but could not find any specific doc..

Is there any doc which can confirm that if there is power issue it can cause the switch in hang !!!!


Thanks

Amar

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 2.3 (3 ratings)
Loading.
John Blakley Tue, 06/29/2010 - 09:19
User Badges:
  • Purple, 4500 points or more

Sounds like you had a broadcast storm or a loop somehow to me....



HTH,

John

Giuseppe Larosa Tue, 06/29/2010 - 09:42
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Amar,


>> If it is due to HSRP conflict, will this cause all siwtches hang(hanged for more then 20 mins and came up after restart) except 4 access switches????


No it is not caused by HSRP, as noted by John your network has a big failure caused by a bridging loop.


The error messages about HSRP conflict are symptoms of the problem not the root cause of the problem


You need to check all switches. Look for ports with spanning-tree bpdufilter enabled or for ports with spanning-tree portfast that have been connected to another switch.


A user may have connected a cable to two different ports with STP portfast and bpdufilter enabled and he/she can have created a loop.


Other possible source of problems are configuration errors on port channels : once a port channel is up all changes to the list of permitted vlans have to be done on the logical interface not on the physical members. We experienced this problem two times and can affect also C6500 devices.


You should look at the log file of each switch and to correlate events, if all devices share an NTP reference clock and you use datetime in log messages (default is uptime and this is not useful)  the one that shows the first messages is likely near to the point where the loop had formed


Hope to help

Giuseppe

hobbe Wed, 06/30/2010 - 02:55
User Badges:
  • Gold, 750 points or more

Hi

you state the switches HANG

for me that means they stop responding and they are dead in the water. ie no lights no nothing.

if this is the case then you can have a bug that craches the IOS.

I would say its not likely since you have several different units most likely with different ios.


if you have a broadcaststorm you will se lights flash like crazy since what the switch is trying to do is send the same packets as fast as it can to the next part of the loop wich in its turn will send it back.


Or you could just see a missbehaving host that sends shitloads of small small packets to a broadcast that exhausts the switches.


To check where the problem is attach a sniffer and you will see the offending packets and you will know for sure if it is a storm or resource exhaust.

If storm then disconnect the switches from eachother and sooner or later you will break the loop and you will have an "offender" at that switch unit.

if packets to small and killing switches through that then disconnect the offender


16 hours late but it might help someone...


good luck


HTH

amardram123 Wed, 06/30/2010 - 04:29
User Badges:

Hi.


When this issue occured, all siwtches were restarted and all come back to normal, after which issue didn't occured..

But for this duration we dont have any logs in syslog and no crash info or any specific error found in switches except 1 line logs of the hsrp ip conflict.

That is reason i am not able to find any root cause, but exlporing the possibilities...


STP loop, broadcast storm, IOS is deffered in both core switches, Power related problem(switch not restared, only hanged, no crashinfo found in any of switch)

Just need one more clarification on power related issue..


Is there any chance that UPS Earthing issue can cause the all system in hung state.. any specific doc related to this...


Regards

Amar

mahmoodmkl Wed, 06/30/2010 - 05:15
User Badges:
  • Gold, 750 points or more

Hi,


I think there was a loop in your network becoz u recieve the HSRP duplicate address message when there is STP loop in the network.


I would suggest you look into your STP root bridges,priority,trunk links etc.



Thanks


Mahmood

Actions

This Discussion