Connect a non-stack switch to router with ether channel caused network down

Unanswered Question
Jun 23rd, 2010

We just experienced a problem and caused production network down.

Here is the scenario:

Pre-condition:

Two 3750G/48 have stacked together, configured port channel include both port 49 (fiber), then connected to core router 6509 port 2/19 and 2/20, both ports are also in port channel.  Running fine, about 100 machines are connected to this stacked switches.

One new 3750G/48 switch powered on and connected to the port 2/14 at the same router. This new switch has not been stacked to other two switches---means it is a standalone switch without any machine connected to.  The router port was not turned on, but switch port 49 is on.

After turned on the router port 2/14 and configured as part of port channel with port 2/19 and 20, whole network (that core router supports) down immediately. Pull the fiber out from switch immediately, disconnected it from the network, the traffic come back.

From existing stacked switch, we saw many mac flapping info in the log:

Jun 17 14:58:25.933: %SW_MATM-4-MACFLAP_NOTIF: Host 0024.e834.9723 in vlan 110 is flapping between port Po1(note: this is switch's ether channel) and port Gi1/0/27 (note: this is port connected to the machine)

Jun 17 14:58:27.024: %SW_MATM-4-MACFLAP_NOTIF: Host 0024.e841.ad31 in vlan 111 is flapping between port Gi2/0/11 and port Po1

In the other switch, I saw:

Jun 17 14:58:53.830: %SW_MATM-4-MACFLAP_NOTIF: Host 0015.2b68.bf80 in vlan 113 is flapping between port Po2(note: this is to standby router 6509) and port Po1 (note: this is to active router 6509).

Jun 17 14:58:53.830: %SW_MATM-4-MACFLAP_NOTIF: Host 0018.8b31.e7cc in vlan 49 is flapping between port Po2 and port Po1

Jun 17 14:58:53.830: %SW_MATM-4-MACFLAP_NOTIF: Host 0001.d740.4c85 in vlan 921 is flapping between port Po1 and port Po2

Note(again):Here Po1 (etherport) is to active core router and Po2(etherport) is to standby core router.

Looks like in the procedure,  a certain loop was formed.

Here are my Questions:

a.   What was the problem to cause core router jammed and cannot handle traffic?

b.   What is the correct procedure to add a new 3750G to existing stack?

c.   Is the stackWise Cable hot swappable? What issues that we need to pay attention  to add a new 3750G to the existing stack? Some coworkers suggested to config ether channel first, then connect physical fiber. Is it correct?

d. Last question: if we have 3 stacked 3750G, but for some reason, such as, stackwise cable broken, one switch is disconnected from the other switch in the stack, can this scenirio cause the network jam as I experienced?

Thank you very much for your time and help.

6509 IOS ver: 12.2(33)SXH5

3750G IOS ver: 12.2.50-SE3

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Dipesh Patel Wed, 06/23/2010 - 20:32

Dear Gongyuan Yao,

Can you please post the Topology and Configuration for the same. This will make more sense to understand your problem.

Regards,

Gongyuan Yao Fri, 06/25/2010 - 08:10

Hi Dipesh,

Thank you so much for your response and help.

The diagram looks like the following:

Initial setup:
 
Core2------Core1=2/19-20==== 1/0/49 3750 Stack
  |                          2/0/49 3750 Stack

    |-----------------------------------====1/0/50      3750    stack

                                                            2/0/59      3750    stack

     Core1  2/19-20 in port-channel 1

     3750 switch 1/0/49 and 2/0/49 in Po1 (port-channel 1)

Problem Scenario:

               |--------1/0/49 New Standalone 3750
               | 2/14(Po1)  
Core2------Core1=2/19-20====1/0/493750 Stack
  |                          2/0/493750 Stack  ====gi1/0/27  a PC
   |---------------------------------------------====1/0/50      3750    stack   

                                                          2/0/59   3750  stack

              
 

There was no machine connected to the new standalone 3750.  After configure core port 2/14 to be a part of core port-channel with 2/19-20, loop occurred.

From stacked switch log, I can see the pkg loop from core back to the stacked switch---learnt mac addr flap between phy port and port-channel (error msg log:  Host 0024.e834.9723 in vlan 110 is flapping between port Po1 and port Gi1/0/27).

From discussion with other engineers, they believed that the Gi2/14 in core never sent BPDU packets to the new standalone switch, the port 1/0/49 (standalone switch) is in FWD state. And also Gi2/14 in core1 believed it was connected to an end machine (not switch), it is in FWD too.  When  a PC sent a packet to stacked switch, then to  core1, core1 will broadcast to all of ports in FWD state, so the new standalone switch would receive it, it has no machine to connect, it forward to all of ports in FWD state, which means it sent back to core1, then Core1 forwarded to the stacked switch through Po1 (etherchnl 1), this caused stacked switch learnt the PC mac from Po1 (flap). Next package to PC coming to stacked switch, will be routed back to Po1. Core1, then route back again.. loop started from here.

Here is my confusion:

The standalone switch only has one physical port connected to core1 only, if it received frame, will it drop it or sent it back?

Also we use load-balance src-dst-ip for port-channel, will it cause core1 to send frame to 2/14 to standalone switch?

All of our ports in port-channel is “ON” (not LCAP or PAGP). That is not good. If we like to change to the mode to “desirable”, what is the process? Change switch side first? Or change core side first? Or we have to remove the mode from all ports from port-channel, then change them together?

Last Q, if we have these three switches stacked together already, and port-channel is running fine, one day, due to some accident, one of switches  disconnected from other two switches (stackwise cable disconnected), will this cause the problem I experienced? Why? If it will not, why?

Regarding adding one more 3750 to the existing stack and make the fiber port to be part of etherchannel group (both in router and in this 3750 stack), I have the following step by step procedure as I think, please help and let me know if I need to change anything:

Pre-condition:

Two 3750g/48 is connected with Stackwise cable and running in production network already.

SW1’s StackA is connected to SW2’s StackB and

SW1’s StackB is connected to SW2’s StackA.

Steps:

1.       Power down the new switch SW3.

2.       Disconnect SW1’s StackA from SW2’s StackB

3.       Connect  SW2’s Stack B to SW3 (new sw) StackA.

4.       Connect SW1’s StackA to SW3 StackB.

5.       Do not config SW3’s fiber port as part of the etherchannel.

6.       Do not config core router’s port that will be connected to SW3’s fiber port as etherchannel in core side.

7.       Disconnect the fiber from core to SW3.

8.       Power on SW3.

9.       Connect fiber from SW3 and Core (without config etherchannel), to make sure connection is up.

10.   Then disconnect.

11.   Make sure SW3 is a part of Stack and not the master.

12.   Config SW3 fiber port as part of the ether channel.

13.   Config router’s fiber port to SW3 as the member of etherchannel.

14.   Connect the fiber.

Thanks again for your time and great help.

gy

glen.grant Fri, 06/25/2010 - 09:47

   Your answer seems to indicate you had a standalone 3750 that hooks back to a Core switch.  You then attempted to to put the connection into the same port channel that was already  in a portchannel going to the 2 stacked 3750 's .  Is this correct ?  If so then you cannot do this as the standalone 3750 is not part of the stack and I can see where that would cause all sorts of issues .  Why would you try to put the standalone into the same port channel .  If I understand this incorrectly please clarify.    If you want a port channel for the standalone 3750 then you would have to create a whole new portchannel  on both sides...

Gongyuan Yao Fri, 06/25/2010 - 10:23

Hi Glen,

Thank you so much for you reading through the post.

Your understand is correctly. But my intention was to add this standalone switch to the stack, but the sequence was WRONG.

That is why I post the possible correct seq.

The issue is that even in this wrong seq, we thought it can cause the switch jam, but it caused whole network down.

I like to know what the switch behavior to cause the jam and we can avoid the similar problem in the future.

Also my other question is

if there are three or more 3750 already stacked together and have port-channel to core, when ONE of the stacked switches disconnect from the stack, should also cause the same problem?

Thanks again for your time and help.

gy

Actions

This Discussion