Solved: Re: LAN with cisco switches occasionaly crashes

marh · ‎06-04-2003

We have network with star topology and about 15 switches. There are two 3508G modular switches at the top. Switches are connected with fiber on gigaports. Once in a few days happens, that network crashes for about 20 - 30 minutes. LED's on switches started to flash synchronously on all switches. Ping to servers are about 5000 ms. Then, after that time, all behaive normally without doing anything. Loop isn't the cause, we have tested the spanning tree. There is nothing in log file. All we can see is for example 'FastEthernet0/35 is experiencing errors' or 'Topology Change rcvd on GigabitEthernet0/1 vlan 1'. These ports aren't allways the same. We have checked speed and duplex settings on these ports, we have change some patch cables. There isn't a lot broadcasts. Have someone saw something similar ? Any idea?

milan.kulik · ‎06-05-2003

There might also be multicasts.

I remember a problem with application using multicasts to deliver data in the past.

I would connect a data analyzer to a switch port to capture data in the time of problem (20 minutes is a long time enough) and check what is going in the network.

Regards,

Milan

View solution in original post

b-price · ‎06-05-2003

I'm sure you've looked at all of this but here it goes anyway:

Make sure you don't have more than seven hops from your root switch to your farthest edge switch.

Multicast servers, such as Ghost servers, can cause this exact thing. I've seen it.

Make sure your configuration revision numbers on your edge switches aren't higher than you your root bridge.

Do a sh top on your set based switch to see if one port is storming.

View solution in original post

9mmurphy · ‎06-04-2003

I would check you STP closely.

Is your root bridge and secondary bridge the two 3508G's?

If the root bridge is else where in your network and they have to go through a root bridge election process, that could take some time. It should not be 20-30 minutes though.

A layer 1 problem could generate enough errors and cause STP problems also. You may want to check for Unidirectional Link problems. Fiber is good on one side, but either TX or RX fiber is bad on the other end. This can wreck havoc on a VLAN.

If the switch supports it, enable UDLD on your fiber trunks.

I tend to diagram the entire STP layout, that helps to identify problems that are not apparent. Port that should be forwarding are forwarding and ports that should be blocking are blocking. It can be amazing what you find when you map each STP per vlan out.

HTH

rjackson · ‎06-04-2003

If all the leds are flashing then it is a broadcast or broken unicast issue;something is being flooded out all ports.

milan.kulik · ‎06-05-2003

There might also be multicasts.

I remember a problem with application using multicasts to deliver data in the past.

I would connect a data analyzer to a switch port to capture data in the time of problem (20 minutes is a long time enough) and check what is going in the network.

Regards,

Milan

manish-young · ‎06-05-2003

dear marh,

i might be sure of physical loop (STP) , but there might be some kind of logical loop. check your routing tables , try to figure out which subnet (if vlans) is creating problem. try pinging the broadcast ip address of the subnets you will come to know which one is creating problem.

this might help you

regards

Manish young

b-price · ‎06-05-2003

I'm sure you've looked at all of this but here it goes anyway:

Make sure you don't have more than seven hops from your root switch to your farthest edge switch.

Multicast servers, such as Ghost servers, can cause this exact thing. I've seen it.

Make sure your configuration revision numbers on your edge switches aren't higher than you your root bridge.

Do a sh top on your set based switch to see if one port is storming.

jawad1979 · ‎06-09-2003

I have corrected a problem similar to this but not exact, try these by priority:

1)Make sure that all Giga ports use hardware flow control.

2)Try to use the same stp version for all switches.

My problem was solved here.

3)Check your configuration of "uplink fast" "port fast",........

4)DONT assign any switch at the core to be the root bridge.

5) Use stp for different vlans carefully to load balance the network.(change port cost )

6)DONT use VLAN1 for users, keep it for management.

If the above did not fix it, try debug "interface", and show us the output

good luck

marh · ‎06-11-2003

Some new informations regarding network topology.

We discovered 11 AP (WIRELES AP SYMBOL) on 4 different access switches. WLAP is disabled (connection beetwen AP over RF). On AP there is a lot of broadcasts. Is it posible that network meltdown ocured because of AP.

We also found on GB interafaces (MM FO) cca 4000 OUTPUT BUFFER failures and PAUSE INPUT. The FO links-trunks are always UP

The network crashes only for 2-3 min and not for 20 min. as previously written. In that time the access switch sends topology change notice (the max-age timer expired, send topology change notice). In that time access switch became root bridge. (The root bridge is 3508GB and has priority 8192 fixed)

There is only one VLAN (vlan1) configured, on the access ports is configured PORTFAST, on the switches UPLINKFAST. STP is IEEE STP

Switches supports UDLD but it is not enabled.

lgijssel · ‎06-11-2003

If one 3508 is root, the other one should be the backup. Please configure this accordingly. Your problem-description may indicate a broadcast-storm or some other kind of loop.

You could try and use a packet analyzer to pinpoint the cause of this issue by finding it's mac-adress.

Good hunting!

Leo

9mmurphy · ‎06-17-2003

I would like to see the supporting Cisco documentation for this

"4)DONT assign any switch at the core to be the root bridge. "

You do not want your "Network Core" to be the root of access vlans, but you do want the center of your access vlan network to be your root bridge. The term "Core" can be confused with say your Data Center Network Core, the core I was refereing to would be the core of your local LAN/VLAN. Often, this would be your Layer 3 distribution switch, or Layer 2 distribtution switch with access switches attached. This core/center of your vlan/lan should be where your root bridge resides.

On a different note, that 2-3 minutes of network down time, would match up nicely with the time required for your STP to reconverge.

Take a look at this... it might be helpful.

Spanning-Tree Protocol Enhancements using Loop Guard and BPDU Skew Detection Features

http://www.cisco.com/en/US/partner/tech/tk389/tk621/technologies_tech_note09186a0080094640.shtml

mdoldan · ‎06-19-2003

Set all your switches to VTP Transparent