LAN switching problem -- major outage

Unanswered Question
Oct 16th, 2008
User Badges:
  • Bronze, 100 points or more

Hi there,


I experienced a very odd issue this morning that I can't explain. I'm hoping that someone has some ideas of what I can check to see what actually happened.


Here's the scenario: Early this morning I noticed one of my servers couldn't connect to another server on the same VLAN. While I was investigating I noticed that all of the sudden all of my servers were having connectivity problems. When I say connectivity problems, I mean total network connectivity problems -- couldn't even ping them. There were a few servers that seemed to be unaffected but the majority of them went offline. My first thoughts were to check for high CPU utilization on the core switches (redundant 4507R's) -- nothing to see there. No log entries about any problems, no OSPF recalculations or HSRP failovers. Next, I started to look at the connectivity between devices (core-distribution) -- there wasn't anything to see there either; no large bursts of packets, no interfaces down, etc.


The solution, appeared to be restarted one particular server. Once that server restarted, everything started coming back online. When I look at that servers interfaces though, I don't see any traffic anomalies.


Does anyone have any ideas on what I can look at, from the switch side, to help me diagnose this?


Many thanks in advance,


--Brandon

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
sateeshk10 Thu, 10/16/2008 - 07:33
User Badges:

Hi,


What I suspect,their might be problem with NIC card..Sometimes due to NIC card misbehave It will generate huge broadcast traffic at that time your network might hung up.


But i am not sure this is my assumption as per the above problem.



Regards

sateesh

branfarm1 Thu, 10/16/2008 - 07:35
User Badges:
  • Bronze, 100 points or more

Thanks Sateesh,


I suspect something with that individual server, but I want to examine all possibilities right now.


branfarm1 Sat, 10/18/2008 - 13:23
User Badges:
  • Bronze, 100 points or more

Can anyone tell me something I could check on the switch side that I could use as evidence of the NIC failure? I'm sure you guys know how it is -- Sys Admins tell me it's the network, which techinically I guess it is... but it's because of *their* server.

vishwancc Sun, 10/19/2008 - 00:28
User Badges:

Hi,

Do you have any IDS or netflow configured,if yes what logs are there at the time of issue ?

When you say you are not able to ping the server , were you pinging it from the switch connected directly to server or from the PC connected to the Distribution switch.

If there is no RX load on interface at the time of issue there is nothing weong with the network .The issue happened because the server started acting funny.

I will say keep an eye on the server it might happen again.


Chao

Vishwa

Actions

This Discussion