Re: 2950 gigabit uplinks go to err-disable

tsteeves · ‎01-06-2004

Our campus had an outage and we are trying to find the cause. Something caused all 40 of our 2950 devices fibre uplinks and most of the stacking GBIC's to go in to err-disable mode and not recover. It was NOT just a local event. This outage spanned accross routers and distribution switches and ONLY affected 2950's. 3550's were not disturbed. Our 2950's are WS-C2950G-48-EI and WS-C2950G-24-EI with c2950-i6q4l2-mz.121-11.EA1.bin IOS. Has anyone else experienced this. TAC case E800121 was opened. Their only suggestion was to set the interfaces to "no errdisable detect cause loopback" and "no keep alive". This seems band-aid at best and does not address the cause. Our network is basically Cisco White Paper. Distribution routers and switches connected to edge routers and multipe VLAN support to user switches totalling approx. 200 Cisco devices. Has anyone seen or experienced this type of disruption? Is there a known cause? Is it's cause spurious behaviour or a bug?

skarundi · ‎01-06-2004

switch was probably reacting to a temporary stp loop on the network. The behaviour has been changed in later releases of code. instead of shutting down the link, the link goes into an err-disable state for a few seconds, and then recovers. This was fixed in 12.1(13)EA1. ( Bug iD CSCdz84835 )

What caused the temporary STP loop is something you have to figure out ? Can it happen again ? Yes, it can if you don't upgrade. I'd go along with the TAC engineer's suggestion if you don't want to upgrade.

tsteeves · ‎01-06-2004

Thanks skarundi. I made a typo regarding the IOS. It should have read 12.1(14)EA1. In our case, the interfaces went err-disable and stayed there. We either had to console into the switches and shut/no shut the port (time consuming) or power cycle. we opted for the latter on most.

DWAM_2 · ‎01-07-2004

Hello,

I'm reading your post and i'm interesting about it.

I agree with skarundi about the tempory loop on your network.

I have seen a similar behaviour one month ago on two different customers.

2950 were disturbed.

We have investigated with tacs either and we didn't find something. We investigated on memory dump? no , fiber out? no , ios deffered? no, udld behaviour? no ...) (we use version 12.1.14Ea1 on 2950)

About the recovery command like says skarudi :

the link goes into an err-disable state for a few seconds, and then recovers.

You can enable it in auto mode :

errdisable recovery cause udld

errdisable recovery cause loopback

errdisable recovery interval 120

wich enable the port after 120secondes (for this exemple).

But it's not really a good solution, you have to know what's happen on the network.

Good luck.

rop · ‎01-09-2004

Hello,

Had the same problems in our network, but with 3550 and was told to disable keep-alive on our gig-interfaces. This has help me.

ibatterbe · ‎02-01-2004

I've been seeing the same problem here (Auckland University of Technology) on 3 2950 switches so far since we upgraded to 12.1(14)EA1a. It seems to occur when I change the link between switches from access mode to forced trunk, and an STP loop exists briefly.

In previous versions of IOS, spanning tree would just block the port for 30 seconds and then realise everything was fine and carry on, but this new code seems to prefer to shut the port down entirely, and since in the default switch configuration, there's no errdisable stuff set, this can be a real problem.

I think I'm going to open a TAC case and suggest that errdisable recovery cause loopback xxx is enabled by default, or that there's a holdoff for 60 seconds after a STP topology change added to prevent it shutting down the port when all that is needed is an STP block.

I also note that 'cause loopback' doesn't exist in 12.1(12c)EA1.

tsteeves · ‎02-02-2004

Excellent. Thanks very much. This is the first answer that actually makes sense. Cisco had the right answer after we raised the TAC, but they didn't really seem to care about the cause. I would be curious to hear their response to your question. If you would be so kind, please e-mail with the details. Thanks. *tsteeves@uvic.ca (remove * for anti-spam)

ibatterbe · ‎02-11-2004

Well...

This is what the TAC engineer said, and I must say I'm rather disappointed with his attitude.

"Currently the DE/software engineer do not have plan to change the code, with the following

Reason for this action: This will not be changed, since loopback detection is a useful thing, and the workaround is very simple."