Strange problem with an ether-channel and err-disable status

Unanswered Question
Aug 29th, 2007
User Badges:

Hello all,


I've experienced a strange problem with a customer of ours. The setup for this customer uses 2 core 3750 L2/L3 switches configured with several vlan's and hsrp groups (with svi interfaces) all configured on these 2 3750's. Between the 2 3750's there is a 4Gb ether-channel configured. Everything was working as designed.


The problem started yesterday when a faulty fibre patchcable caused a link-flap on one link on one side of the ether-channel. The port got err-disabled. Not a major priority because the other 3 links were still operating normaly. However around the same time our customer started complaining about intervlan communication problems.


During troubleshooting i noticed that one of the vlan got split up (hsrp status for one vlan on both 3750's got active). So one hsrp group wasn't able to communicatie accross the ether-channel. All the other hsrp groups were operating normaly.


At this time i started suspecting it had something to do with the err-disabled link. So after swapping the cable i re-enabled this link. As soon as this link got operational again the communication problems were gone, also the hsrp communication started working again.


Now as far as i can explain this behaviour isn't normal. One err-disabled link within a multilink ether-channel should cause communication problems for 15 to 20 min. It almost seems as if the switch with the err-disabled link was still trying to use this link within the ether-channel. Have already consulted the bug dbase and release notes but could find anything related to this problem.


Has anyone seen these kind of problems or maybe got an explaination why this was happening?


Setup details: 2 3750-24TS, IOS 122-35SE1

etherchannel: 4Gb dot1q Trunk (mode: on / load-balance: src-mac) links through SFP-CWDM's. All links may pass all the vlan's.


Many thanks,


Dennis


  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
nambi_gct Wed, 08/29/2007 - 00:52
User Badges:
  • Bronze, 100 points or more

Hi Dennis,


Is this a Unidirectional link failure.During this condition this kind of problems may happen because the port channel load shares the traffic in any one of the links and traffic will be lost if the link has failed in one direction.


Regards

Nambi

dkrijgsman Wed, 08/29/2007 - 00:59
User Badges:

Hi Nambi,


No i don't think it's a unidirectional link failure. As one side got err-disabled, the other side had a down status, so this link wouldn't normaly be used anymore by the ether-channel.


Regards

Dennis

dkrijgsman Wed, 08/29/2007 - 01:29
User Badges:

I made a typo, obviously it should be


"err-disabled link within a multilink ether-channel shouldN'T cause communication problems for 15 to 20 min."

glen.grant Wed, 08/29/2007 - 04:24
User Badges:
  • Purple, 4500 points or more

Would have been interesting to see if you had admined down the other side if it would have recovered and this would have told you if you had a unidirectional link , which it almost sounds like . If you aren't using UDLD detection you should be and this would eliminate anything like this .

nambi_gct Wed, 08/29/2007 - 04:31
User Badges:
  • Bronze, 100 points or more

i agree with glen and udld aggressive mode is the right choice to avoid this kind of problem.

dkrijgsman Wed, 08/29/2007 - 04:41
User Badges:

Glen/nambi,


Thanks for you responses so far. Let me do some checking on the udld aggresive mode, to see if this can prevent similar future problems


Dennis

dkrijgsman Wed, 08/29/2007 - 04:33
User Badges:

Guys,


I'm not convinced that udld aggressive mode will prevent this from happening in the future.

I was already using udld detection on all the links. The link got err-disabled because of a link-flap, not because of a udld-detect event. The other site of the link did go down when the err-disabled event happened.


Even if this side didn't go down directly, udld was enabled and when an udld event got detected it would have shut down the port after several secconds.


Dennis

Actions

This Discussion