3750 Stacking & LACP drops during testing

desmond.liew · ‎07-26-2016

Hi Cisco Community,

I am doing a setup to provide high-availability for a pair of stacked C3750X-24S switch and C3750X-48T. The C3750X-24S pair, we can refer to it as Switch 1 & Switch 2. The C3750X-24S pair, we can refer to it as Switch 3 & 4.

The stacked C3750X-24S provides uplinks to 23 x C2960X-24T switches as shown in the diagram below via the blue lines. It also provides downlink to the C3750X-48T switches via the 2 x 10GbE NM using the green lines.

The uplinks from Switch 1 & 2 to the C2960X-24T, we initial configured LACP using the below commands:

interface GigabitEthernet1/0/1
switchport trunk encapsulation dot1q
switchport mode trunk
channel-group 1 mode on

interface GigabitEthernet2/0/1
switchport trunk encapsulation dot1q
switchport mode trunk
channel-group 1 mode on

The downlinks from Switch 1 & 2 to Switch 3 & 4, we initial configured LACP using the below commands. When I do a 'show port-channel summary', all the Port-Channel and member ports are bundled correctly for all uplinks to C2960X and downlinks to C3750X.

interface TenGigabitEthernet1/1/1
description core sw1 to server sw
switchport trunk encapsulation dot1q
switchport mode trunk
channel-group 48 mode on

interface TenGigabitEthernet1/1/2
description core sw1 to server sw
switchport trunk encapsulation dot1q
switchport mode trunk
channel-group 48 mode on

interface TenGigabitEthernet2/1/1
description core sw1 to server sw
switchport trunk encapsulation dot1q
switchport mode trunk
channel-group 48 mode on

interface TenGigabitEthernet2/1/2
description core sw1 to server sw
switchport trunk encapsulation dot1q
switchport mode trunk
channel-group 48 mode on

I connect my laptop to Switch 4 and perform pings to all the management IP of the 23 x C2960X switches. When we perform the following test, we had the following result:

When Switch 3 is powered off, I also have 1 ping drop which is acceptable. When the Switch 3 recovers, I don't see any ping drops. I also don't see the downlink ports of switch 1 & 2 turning orange and then green. I don't see any 30 second outage.
When Switch 1 is powered off, I have 1 ping drop which is acceptable. Switch 2 takes over as expected. However, when I power on Switch 2 again and when it rejoins the stack, I noticed Switch 2 port turn orange and then green. The pings to the uplink switches drops and recovers roughly 30 seconds later. Is this a typical behaviour? I cannot accept that it does this.

Initially, we used 'channel-group x mode active' and when we power off either Switch 1/2 or Switch 3/4, we noticed the Port-Channel going down before going up again. I had to change it to 'channel-group x mode on'. That is when I got stuck at issue 1 above.

I am running on Version 15.2(3)E2 for all C3750X switches.

Greg Smalley · ‎07-26-2016

Do you have "spanning-tree portfast" enabled on your access port?

-Greg

ahmedshoaib · ‎07-26-2016

Hi;

LACP is recommended when ever you are going to configure Ethercahnnel.

With reference to 30 Sec delay whenever you power off the Switch. Some of the following things you can do to min. the time.

1. Run Rapdi STP instead of STP

2. Run spanning-tree uplinkfast. (Improves the convergence time of the Spanning-Tree Protocol (STP) in the event of the failure of an uplink).

3. spanning-tree portfast bpduguard default. (Which enable Portfast on access port & whenever received BPDU it will put the interface as err-disable.

Thanks & Best regards;

desmond.liew · ‎07-26-2016

Hi Ahmed and Greg,

I have been reading your comments. Let me update the diagram again to illustrate that all the uplink and downlink ports to the respective switches are etherchanneled as shown.

Answering the questions:

1. LACP is recommended when ever you are going to configure Ethercahnnel.

Initially I was running 'channel-group x mode active' for all uplinks and downlinks as shown. During my tests of powering down (failure) and powering up (recovery), both instances shows that the port-channel of the active switch went down before going up again about 30 seconds later.

So, I changed to 'channel-group x mode on' to quicken the 'bonding'. During my tests, when I power down Switch 1, I get 1 ping drop (which is acceptable). But when I power up Switch 1 (recovery), I see ping drops before they recover after 30 seconds later.

Shouldn't LACP (where on mode active or on) 'eliminate' the 2s (RSTP) or 45s (STP) spanning-tree?

I cannot understand why there is a recovery of 30seconds when I power up Switch 1 even if during the power down (failure), I had 1 second drop.

2. Run spanning-tree uplinkfast. (Improves the convergence time of the Spanning-Tree Protocol (STP) in the event of the failure of an uplink).

Do I need to configure this on the LACP member ports for all switches?

3. spanning-tree portfast bpduguard default. (Which enable Portfast on access port & whenever received BPDU it will put the interface as err-disable.

All access ports are currently configured with the below:

switchport access vlan x
switchport mode access
spanning-tree portfast

Are you saying that I should use 'spanning-tree portfast bpduguard default'?

4. Do you have "spanning-tree portfast" enabled on your access port?

Yes, I do.

desmond.liew · ‎07-26-2016

I have forgotten to mention that Switch 1 & 2 (stacked) has the below spanning-tree configuration. Previously they were not stacked so spanning-tree was required. These were some residue commands. I am unsure if they are the one exhibiting the symptoms.

spanning-tree mode rapid-pvst
spanning-tree extend system-id
spanning-tree vlan 10,12,14,16,31 priority 28672
spanning-tree vlan 11,13,15,30,32 priority 24576
spanning-tree vlan 10-16,30-31 forward-time 7
spanning-tree vlan 10-16,30-31 max-age 10

ahmedshoaib · ‎07-26-2016

Hi,

You need to enable spanning tree uplink fast on all switches other than stp root.

For ether channel you can use either on, pagp or lacp. LACP is recommended due to industry standard.

desmond.liew · ‎07-27-2016

Hi Ahmed,

Thanks. I will implement the 'spanning-tree uplinkfast' on all switches (Switch 1 & 2 which are stacked, Switch 3 & 4 also stacked and all the edge/access switches which are uplinked via Switch 1 & 2).

I am also going to revert my 'channel-group x mode on' back to 'channel-group x mode active' to avoid risks to loops and err-disable.

But are you able to explain the symptoms when I power off either Switch 1 or 2 and after I power on either Switch 1 & 2, the port-channel interfaces uplinking to the edge/access switches or the server farm switches going down and then up again. Is this due to the spanning-tree?

desmond.liew · ‎07-28-2016

Just to share with everyone in this forum, I believe the below is the issue I am experiencing:

Let me try to do the following changes on my switches:
1. Change all the EtherChannel between all switches to PaGP. (channel-group x mode desirable) except for non-Cisco switches.
2. Ensure spanning-tree is rapid-pvst.
3. Add 'spanning-tree uplinkfast'.