spanning tree and why?? caused network down

Answered Question
Feb 1st, 2010

can somone please share his experience

we have 2 core switches 6509 A and 6509 B connected to each other over a portchannel (80) made up of two ports. all vlans are allowed on this etherchannel

there are more then 40 access switches connected to these core switches with one gigabit port connected from each access switch to 6509 A and 6509 B respectively.

recently we added 6 new vlans. created these vlans on both core 6509 switches (didnt set any root bridge priority on any 6509). for all these new vlans 6509 A and 6509 B will be the default gateway. vlan interfaces for all 6 vlans were created on 6509 A and B with 6509 A have HSRP priority for the first 3 vlans and 6509 B have HSRP priority for the remaining 3 vlans.

2 new HP siwtches access switches were configured for these vlans

so this is how it looks

6509 A ---HP1 ( stp off on HP1)

6509 B -- HP1(stp off on HP2)

6509 A ---6509B

6509 A is  configured with a trunk portchannel (50) allowing these new vlans on the trunk and connected to HP1

6509 B is  configured with a trunk portchannel (50) allowing these new vlans on the trunk and connected to HP2

when we enabled portchannel 50 on 6509 A , no issues

as soon as we enabled portchannel 50 on 6509 B, network hang

below are the logs from 6509. HSRP failover happend and one of the port on port channel 80 went to err disable

000762: .Jan 29 18:42:07: %PIM-5-DRCHG: DR change from neighbor 0.0.0.0 to 10.1.
60.2 on interface Vlan60 (vrf default)
000763: .Jan 29 18:42:12: %SYS-5-CONFIG_I: Configured from console by m onvt
y0 (10.1.102.232)
000764: .Jan 29 18:42:28: %STANDBY-6-STATECHANGE: Vlan60 Group 100 state Standby
-> Active
000765: .Jan 29 18:44:03: %PIM-5-DRCHG: DR change from neighbor 0.0.0.0 to 10.1.
70.2 on interface Vlan70 (vrf default)
000766: .Jan 29 18:45:52: %PIM-5-DRCHG: DR change from neighbor 0.0.0.0 to 10.1.
80.2 on interface Vlan80 (vrf default)
000767: .Jan 29 18:45:52: %SYS-5-CONFIG_I: Configured from console by m onvt
y0 (10.1.102.232)
000768: .Jan 29 18:48:15: %SYS-5-CONFIG_I: Configured from console by m onvt
y0 (10.1.102.232)
000769: .Jan 29 18:52:23: %SYS-5-CONFIG_I: Configured from console by m onvt
y0 (10.1.102.232)
000770: .Jan 29 20:34:04: %SYS-5-CONFIG_I: Configured from console by m onvt
y0 (10.1.102.232)
000771: .Jan 29 20:34:22: %STANDBY-6-STATECHANGE: Vlan1000 Group 251 state Stand
by -> Active
000772: .Jan 29 20:34:23: %STANDBY-6-STATECHANGE: Vlan1000 Group 251 state Activ
e -> Speak
000773: .Jan 29 20:34:31: %STANDBY-6-STATECHANGE: Vlan70 Group 100 state Standby
-> Active
000774: .Jan 29 20:34:31: %STANDBY-6-STATECHANGE: Vlan70 Group 100 state Active
-> Speak
000775: .Jan 29 20:35:00: %STANDBY-6-STATECHANGE: Vlan113 Group 100 state Standb
y -> Active
000776: .Jan 29 20:35:03: %STANDBY-6-STATECHANGE: Vlan113 Group 100 state Active
-> Speak
000777: .Jan 29 20:35:04: %STANDBY-6-STATECHANGE: Vlan80 Group 100 state Standby
-> Active
000778: .Jan 29 20:35:05: %STANDBY-6-STATECHANGE: Vlan1000 Group 251 state Stand
by -> Active
000779: .Jan 29 20:35:06: %STANDBY-6-STATECHANGE: Vlan1000 Group 251 state Activ
e -> Speak
000780: .Jan 29 20:35:07: %STANDBY-6-STATECHANGE: Vlan80 Group 100 state Active
-> Speak
000781: .Jan 29 20:35:35: %STANDBY-3-DUPADDR: Duplicate address 191.1.20.71 on V
lan2, sourced by 0000.0c07.ac64
000782: .Jan 29 20:36:22: %STANDBY-3-DUPADDR: Duplicate address 10.1.107.2 on Vl
an107, sourced by 0000.0c07.ac64
000783: .Jan 29 20:36:28: %STANDBY-6-STATECHANGE: Vlan80 Group 100 state Standby
-> Active
000784: .Jan 29 20:36:29: %STANDBY-6-STATECHANGE: Vlan1000 Group 251 state Stand
by -> Active
000785: .Jan 29 20:36:29: %STANDBY-6-STATECHANGE: Vlan113 Group 100 state Standb
y -> Active
000786: .Jan 29 20:36:30: %STANDBY-6-STATECHANGE: Vlan113 Group 100 state Active
-> Speak
000787: .Jan 29 20:36:31: %STANDBY-6-STATECHANGE: Vlan1000 Group 251 state Activ
e -> Speak
000788: .Jan 29 20:36:31: %STANDBY-6-STATECHANGE: Vlan80 Group 100 state Active
-> Speak
Jan 29 20:36:32: %UDLD-SP-4-UDLD_PORT_DISABLED: UDLD disabled interface Gi1/1, u
nidirectional link detected
Jan 29 20:36:32: %PM-SP-4-ERR_DISABLE: udld error detected on Gi1/1, putting Gi1
/1 in err-disable state
000789: .Jan 29 20:36:40: %STANDBY-6-STATECHANGE: Vlan113 Group 100 state Standb
y -> Active
000790: .Jan 29 20:36:43: %STANDBY-6-STATECHANGE: Vlan70 Group 100 state Standby
-> Active
000791: .Jan 29 20:36:44: %STANDBY-6-STATECHANGE: Vlan1000 Group 251 state Stand
by -> Active
000792: .Jan 29 20:36:46: %STANDBY-6-STATECHANGE: Vlan70 Group 100 state Active
-> Speak
000793: .Jan 29 20:36:47: %STANDBY-6-STATECHANGE: Vlan113 Group 100 state Active
-> Speak
000794: .Jan 29 20:36:48: %STANDBY-6-STATECHANGE: Vlan80 Group 100 state Standby
-> Active
000795: .Jan 29 20:36:53: %STANDBY-6-STATECHANGE: Vlan80 Group 100 state Active
-> Speak
000796: .Jan 29 20:36:53: %STANDBY-3-DUPADDR: Duplicate address 10.1.103.2 on Vl
an103, sourced by 0000.0c07.ac64
000797: .Jan 29 20:36:56: %STANDBY-6-STATECHANGE: Vlan1000 Group 251 state Activ
e -> Speak
000798: .Jan 29 20:36:57: %STANDBY-6-STATECHANGE: Vlan113 Group 100 state Standb
y -> Active
000799: .Jan 29 20:37:01: %STANDBY-6-STATECHANGE: Vlan113 Group 100 state Active
-> Speak
000800: .Jan 29 20:37:19: %STANDBY-6-STATECHANGE: Vlan113 Group 100 state Standb
y -> Active
000801: .Jan 29 20:37:20: %STANDBY-6-STATECHANGE: Vlan113 Group 100 state Active
-> Speak
000802: .Jan 29 20:38:15: %STANDBY-6-STATECHANGE: Vlan70 Group 100 state Standby
-> Active
000803: .Jan 29 20:38:18: %STANDBY-6-STATECHANGE: Vlan70 Group 100 state Active
-> Speak
000804: .Jan 29 20:59:52: %SYS-5-CONFIG_I: Configured from console by m onvt
y2 (10.1.102.232)
000805: .Jan 29 22:03:59: %SYS-5-CONFIG_I: Configured from console by m onvt
y2 (10.1.102.232)

I have this problem too.
0 votes
Correct Answer by Giuseppe Larosa about 6 years 10 months ago

Hello Followyourself,

if ther Cisco switches are using PVST+ and the new etherchannel links are L2 802.1Q trunks, the Cisco switches use a proprietary BPDU encapsulation.

This can be seen as simple user multicast traffic from HP switches.

There is a risk that a bundle link can see a BPDU sent by other member link that has gone via the HP switch.

But besides this I don't see a clear cause for a loop.

When you enable the STP on the HP switches they are detected on native vlan where Cisco switches use IEEE standard BPDU.

see

http://www.cisco.com/en/US/products/hw/switches/ps700/products_white_paper09186a00801b49a4.shtml#cg5

PVST+, RPVST+4

0x010b

01-00-0c-cc-cc-cd

this is seen as user traffic by HP switch.

I see potential for bundles to go in error disable.

Are you sure that there are not multihomed servers connected to both HP switches?

Hope to help

Giuseppe

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (1 ratings)
Loading.
followurself Mon, 02/01/2010 - 09:09

wht we did is shut down teh interfaces on HP2, to stable it. once stable

enabled stp on hp swithches and enabled the hp2 interfaces again. hp1 has become

the root switch. with hsrp gateways above we need to adjust the prority and get 6509 a and 6509 b to the root bridge respectively

but

where was the loop? with no stp on hp swithes why did 6509 went crazy. can there be any loop up on those 40 access switches causing this.

one of the port on 6509 portchannel 80 went to err disable? why

we are looking to connect more hp switches in sometime with new vlans. we wd nt like this to happen, specially with hp switches not running stp, hence they not participating in stp calcualtion what cd be the cause when etherchannel on 6509 b was brought up network hang..

will appreciate the suggestion and steps to start looking to develop and loop free network, where to start to look why it happened?

Jon Marshall Mon, 02/01/2010 - 09:11

6509 A ---HP1 ( stp off on HP1)

6509 B -- HP1(stp off on HP2)

6509 A ---6509B

Do you actually mean you turned off STP on the HP switches ? Can you clarify how each was connected back to the 6500 switches and whether they were interconnected.

And why did you turn STP off if that is indeed what you did ?

Jon

lamav Mon, 02/01/2010 - 09:20

Are the HP switches connected to each other? If so, is it a trunk that allows all the vlans?

glen.grant Mon, 02/01/2010 - 09:32

  Everything went down  because you enabled a trunked etherchannel down to HP's if those did not have STP enabled or misconfigured you basically put a loop across all vlans allowed when you enable the etherchannel  down to the hp's . You basically bridged 2 ports together when you did that.  You have to have STP running  in situations where you have a built in loop across the network for redundancy .

lamav Mon, 02/01/2010 - 09:37

Glen, given his description of how he has the HP switches connected, there is no built-in loop. The topology is a loop-free inverted U. HP 1 is uplinked to 6509A, A is trunked to 6509 B, and B is downlinked to HP 2. Where is the loop for those 6 new vlans?

This is why I asked if HP 1 and HP 2 were trunked to each other...then there would be a loop.

Victor

followurself Mon, 02/01/2010 - 14:03

HP1 and HP2 both are not  interconnected to each other. STP is disabled only on HP. they are not connected , stp not running, they shoudnt send any bpdus.

when ehtherchannel from 6509 A to HP1 was enabled there was no issue, as soon as the etherchannel was enabled on 6509 B and HP2 , network was down.

since there was no loop what can cause it?

thanks

Correct Answer
Giuseppe Larosa Mon, 02/01/2010 - 14:14

Hello Followyourself,

if ther Cisco switches are using PVST+ and the new etherchannel links are L2 802.1Q trunks, the Cisco switches use a proprietary BPDU encapsulation.

This can be seen as simple user multicast traffic from HP switches.

There is a risk that a bundle link can see a BPDU sent by other member link that has gone via the HP switch.

But besides this I don't see a clear cause for a loop.

When you enable the STP on the HP switches they are detected on native vlan where Cisco switches use IEEE standard BPDU.

see

http://www.cisco.com/en/US/products/hw/switches/ps700/products_white_paper09186a00801b49a4.shtml#cg5

PVST+, RPVST+4

0x010b

01-00-0c-cc-cc-cd

this is seen as user traffic by HP switch.

I see potential for bundles to go in error disable.

Are you sure that there are not multihomed servers connected to both HP switches?

Hope to help

Giuseppe

glen.grant Mon, 02/01/2010 - 15:26

    I agree in a normal etherchannel situation it is seen as a single spanning tree instance across the etherchannel.  I do have some questions on how cisco would handle the etherchannel when the other end has no spanning tree running  to  setup that channel correctly . If the channel did not setup correctly  then you do a have a loop between the ports in the etherchannel . I'll be honest I'm not sure how the cisco will react to a port channel when the other end has no spanning tree running .

on each hp switch we have 16 servers, hence 16 ports. what i have noticed is. all this 16 ports have native vlan 4094. all these 16 ports are enabled for tagging. the reason for tagging is because on all these 16 ports ESX servers are installed. these  esx servers will host diff environments hence diff vlans. in our case vlans 50,60,70,80. the esx switch doesnt participate in stp. native vlan for these ports 4094 will never will be seen on that port because esx is not configured for that vlan

on the etherchannel trunk between 6509 and HP , this vlan 4094 is not allowed. but do remember tht stp was off on hp switches

are the servers multihomed? they are multihomed and configured for source based load balancing

so we have a vlan 4094, not allowed on trunk between HP's and 6509's

vlan 4094 is allowed on etherchannel trunk between 6509A and 6509 B

there is a config difference on the port channel port of the sec 6509 switch

interface GigabitEthernet1/2

switchport
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 1-1005,1024-4094
switchport mode trunk
no ip address
spanning-tree portfast disable
spanning-tree guard none
channel-group 80 mode on

the command above in bold doesnt exist on the pri switch port which is part of its etherchannel connected back to pri

followurself Tue, 02/02/2010 - 09:11

Giuseppe

can you explain why do u see potential of err disable on the etherchannel between the 2 6509 switches?

Thanks

Actions

This Discussion