etherchannel

Unanswered Question
Mar 14th, 2008

I have made two runs up this hill:

Our company has 6513 at core and 3750 stacks in closets and server farms. We had a single fiber run to each stack from the 6513. I wanted to run additional fiber to each stack and port-channel them.

It worked well for the 6513 to the wiring closets on floors. But the same set up didn't work so well for the stacks that connect the server farms.

The first attempt we began to see duplicate IP errors all over the place. I studied all the information I could get my hands on and concluded that one of the stacks had to have had some kind of loop condition. I searcched and never found anything but then too, I don't control what all the server admins plug in. At one point I found a hub on one switch with cables going all over the place. I pulled it out of the mix right away.

The next chance I got to try etherchannel, I hooked that switch up first. After several hours of all 4 links up and in the channel group there were no errors. I added one more stack. Still no errors.

But then the next thing I started to get calls about slow servers and they were on the two stacks that I had set up port channeling for. Not all servers on those farms had issues, but the ones that did have slowness issues were on the switches where I'd turned on either channel. BUT no errors still!

I had to turn it back of so that people could function but I'm at a loss to explain. In fact while I was undoing the port channel I learned the admin folks were rebooting their servers and the problem went away. But I don't know for sure if it went away 'cause they rebooted or 'cause I shut off port channel.

I do know the servers in question use MSload-balancing. Is there anything about the way MS load balances that might make port channeling adversely impact performance?

I just can't explain this. If I am getting no errors on the links or in the server logs, I'm pretty sure I don't have a loop. Wonder where else to look? I would love to have port-channel working.

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (1 ratings)
Loading.
lamav Fri, 03/14/2008 - 14:12

Sue:

Whaen you say you added another stack, do you mean that you added another switch to the existing stack, or did you add an entire new stack and uplink that one to the 6513, too?

Is this a routed access layer or switched?

Is the 6513 alone or does it operate as a member of an HSRP group?

You need ot tell us more about your topology.

A diagram would help.

Victor

suelange Mon, 03/17/2008 - 06:40

Thank you victor, you are correct, there wasn't enough information. Our layout is so simple that I forget sometimes it still bears details.

Attached is diagram, but to answer your question:

1) no HSRP on the 6513. There was the 1st time I tried this, and the funny thing was, it was obvious there was no point. It was HSRPd to a device in another building..there is no way they can stand in for each other. SO I whacked that, and suspect ultimately that is what solved my 1st round of errors a long time back.

2) We have 4 stacks of 3750's in the farm, 3 3750's per stack. We have 3 stacks of 3750's with 3 switches per stack, one to each closet. Each of these stacks has a port channel group reserved for that stack alone. When I said "I added another stack" I mean I put a second group of switches into their own port channel so that, at that point, I had two distinct port channels, each going to a distinct set of 3750's, each in their own stack. The diagram shows two links to each closet and those port channels all work fine. But the exact same config on the server farm is not working well.

3) Although the 3750 can switch, the way the prior engineer set it up they are all being used as L2 access switches. Everything is routed at the 6513. But that's almost a non-issue since they've not bothered to set up too many vlans. MOST of this network is flat-switched, something I'd like to fix. About the only real routing is an EIGRP/BGP net I set up to handle the branch offices across MPLS. All the servers are on the default vlan, I'm sorry to say.

Attachment: 
mbroberson1 Sun, 03/16/2008 - 17:16

Make sure your etherchannel loadbalancing is working properly. Usually by default loadbalancing works well over etherchannel, but sometimes certain links will be preferred. I would also suggest hard setting your ether channel config on both side of the link (use mode on), don't use anything that auto's. Remember up to 8 links are supported on etherchannel. You may want to look at doing layer 3 from your 6513 to your 3750's running EIGRP...much more efficient of course you will most likely have to do some re-addresing.

rgodden Mon, 03/17/2008 - 02:26

are you using desirable mode ?

http://www.cisco.com/en/US/products/hw/switches/ps700/products_white_paper09186a00801b49a4.shtml#cg6

Cisco Configuration Recommendation for L2 Channels

Enable PAgP and use a setting of desirable-desirable on all EtherChannel links. See this output for more information:

Switch(config)#interface type slot#/port#

Switch(config-if)#no ip address

!--- This ensures that there is no IP

!--- address that is assigned to the LAN port.

Switch(config-if)#channel-group number mode desirable

!--- Specify the channel number and the PAgP mode.

Verify the configuration in this way:

Switch#show run interface port-channel number

Switch#show running-config interface type slot#/port#

Switch#show interfaces type slot#/port# etherchannel

Switch#show etherchannel number port-channel

suelange Mon, 03/17/2008 - 06:46

You and Rgodden in the next post seem to disagree on the use of "on" vs. "desireable", I suspect based on individual experiences which probably differ.

What I've read on Cisco's site about "cross stack" ether channel, where you have a stack and you want one link on one switch in the stack and one link on another switch in the stack says to use "ON" at both ends, so this is what I did. It works in my closets but not in my server farms. I'd love to segment the farm out further, see my reply to Victor earlier. For now however, this is what I have and it should work, I would think....

wiesinger Mon, 03/17/2008 - 08:07

Be aware that there is a difference between l2 & l3 port-channels - mixing them will never work.

Keep lan-port config simple and do the rest on the port-channel x interface .

Already checked syslogs from your devices, i suspect .

How are the servers connected - also channeled, or bonded (teamed) ?

Kevin Dorrell Sat, 03/22/2008 - 04:06

"Be aware that there is a difference between l2 & l3 port-channels - mixing them will never work."

Well, that got me thinking, so I decided to try it out in the lab. I seems that you can mix l2 and l3 channels to a limited extent.

I have two interfaces F0/13 and F0/14 on each of my switches, and I have joined these into channel-group 1. On CAT1 I have made Po1 an access port on VLAN 42. On CAT2 I have done no switchport on Po1.

</p><p>CAT1#show run int Po1</p><p>Building configuration...</p><p></p><p>Current configuration : 93 bytes</p><p>!</p><p>interface Port-channel1</p><p> switchport access vlan 42</p><p> switchport mode dynamic desirable</p><p>end</p><p></p><p>CAT1#show run int vlan 42</p><p>Building configuration...</p><p></p><p>Current configuration : 62 bytes</p><p>!</p><p>interface Vlan42</p><p> ip address 151.10.1.10 255.255.255.0</p><p>end</p><p></p><p>CAT2#show run int Po1</p><p>Building configuration...</p><p></p><p>Current configuration : 84 bytes</p><p>!</p><p>interface Port-channel1</p><p> no switchport</p><p> ip address 151.10.1.20 255.255.255.0</p><p>end</p><p></p><p>CAT2#ping 151.10.1.10</p><p></p><p>Type escape sequence to abort.</p><p>Sending 5, 100-byte ICMP Echos to 151.10.1.10, timeout is 2 seconds:</p><p>!!!!!</p><p>Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/4 ms</p><p>

What I couldn't do was to run a trunk through that arrangement. CAT2 would not let me create dot1q encapulated subinterfaces on the Po. (At least, I could create the subinterfaces, but not encapsulate them. I wonder what use they are for?)

It's a corner-case, I know, but it can be done. But I guess I could use it if I was doing a layer-3 distribution - layer-3 in the distribution switch and a VLAN in the access switch - and I wanted the fast convergence and bandwidth of an EtherChannel. Hey, I could even run HSRP between two layer-3 distribution switches for the VLAN, and have only a layer-2 switch in the access layer, and still have EtherChannel on the uplinks.

Kevin Dorrell

Luxembourg

Kevin Dorrell Tue, 03/18/2008 - 01:27

Sue, it looks to me that the problem lies in the cross-stack channels. I have a set up here that is similar except that:

1. I have a pair of 4506s in the distribution layer

2. I have a couple of 2960Gs in each rack, so no cross-stack channels.

3. I use copper instead of fiber.

What is similar is that:

1. I use EtherChannel (2 Gbps to each switch from one distribution switch, and a 2Gbps strap between the rack access switches for redundancy.)

2. On my distribution switches the channels are cross-module

3. My distribution is at layer 2

I suggest you try the Channel without the cross-stack. That is, terminate the two links on the same switch. I know that is not what you want in the long run, but at least it will tell you whether the issue is related to the EC being cross-stack.

Kevin Dorrell

Luxembourg

andrew.burns Tue, 03/18/2008 - 03:15

Hi,

There are arguments for and against etherchannel "mode on" but personally I'd avoid it, and Cisco don't recommend it either as the following quote illustrates:

You should use care when using the on mode. This is a manual configuration, and ports on both ends of the EtherChannel must have the same configuration. If the group is misconfigured, packet loss or spanning-tree loops can occur.

So, you can't use desirable (i.e. PAgP) because it isn't supported on cross-stack channels so the recommended option is to use mode active (i.e. LACP).

The chapter in the configuration guide is recommended reading as there is a long list of "configuration guidelines" which gives all the do's and dont's.

http://www.cisco.com/en/US/docs/switches/lan/catalyst3750/software/release/12.2_44_se/configuration/guide/swethchl.html#wp1275881

Also, make sure the switch stack isn't partioned, using show switch detail and show platfom stack-manager all commands.

HTH

Andrew.

suelange Tue, 03/18/2008 - 07:30

There must be more than one doc on the cross stack config because the one I found specifically stated you must turn both ends on, but the one you quoted says to use active. I can see why, I just wonder how come the change. Maybe related to differnt versions of IOS. I don't have 12.2.44, I have 12.2.25 but I see in the post it's current enough to support this.

It's likely to be several months before they let me try again...it was a year between the last attempt. But when they let me try next, I will attempt it this way.

andrew.burns Tue, 03/18/2008 - 07:57

Yes - you're right, cross-stack LACP is fairly recent (12.2(25)SEC) so any documentation before this version came out will say to use mode on.

From a convergence point of view it's faster to use mode on but there are so many things that can break a channel in new and exciting ways that I've never found it to be worth the risk (different QoS config, MTU values, etc.) - and if it breaks with mode on you need to find the error and fix it before the link will recover. with LACP (or PAgP) both ends will go down and you can failover gracefully to the standby channel (if you have one).

That said, if you really need the extra seconds then sometimes you've no choice.

HTH

Andrew.

Actions

This Discussion