cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2722
Views
0
Helpful
4
Replies

Bond interfaces on one VLAN causing ARP loop and loadbalancer problems

Jacek Tymoczko
Level 1
Level 1

Hi,

So I'm facing a weird problem within my network environment. So quick backgroud on environment, our core switch is Cisco 6509-E (running VRF) with ACE module which works as a loadbalancer.

The thing is that not all servers are behind loadbalancers, only some are.

Recently we needed to exclude from loadbalancer one server due to re-installation work. We didn't place it back and we just started it. On the image this server is named as msmgmt05. This server has two interfaces bonded to VLAN153, subnet 172.16.53.0/24

And after that weird stuff started to happening.

Basically loadbalancer started randomly freaking out, but NO errors in logs could be seen ! Also no errors on the switches involved is visible !

Below there is a draw how it looks and what are the configs on distribution level switches and on VSS.

ScreenShot143.png

The thing is we did a dump of traffic on that bond interface on msmgmt05. And we founded that there is some crazy ARP loop going on.

Most of the time traffic is:

ScreenShot144.png

Did any of You face that kind of issue ?? Or can I provide some more information to put some more lights on it ??

The problem is that loadbalancer basically freaked out after this machine is up, even the machine is not configured there anymore.

BR, Jacek

4 Replies 4

Steve Fuller
Level 9
Level 9

Is this is the same issue as you describe in ARP loop? If so can we just have one post to avoid duplication of effort.

So if I understand what you've done, then I think the problem could be related to the Catalyst 3560.

It would seem that you have the two Catalyst 3560 switches terminating a single port-channel from the VSS switches, and another single port-channel from the server. Presumably the server is using some form of NIC bonding with IEEE 802.3ad Link Aggregation.

The Catalyst 3560 does not support Multi-Chassis EtherChannel and so the port-channels you have from the VSS switches and the server can only terminate on only one of the Catalyst 3560 switches.

Regards

Okay so lets close that topic as the topic ARP loop describe me problem in more detail scope.

Hi Jacek,

I'm not sure which post you want to keep going (hence why we like to avoid duplicates) so I'm going to continue here as the other post has very little detail that's not included here anyway.

So that we have everything in one place here's the diagram from the other post that shows the high level network setup and MACs of the servers.

ScreenShot145.png

For the server on the left can you explain what you mean by "Server interface MAC" and "PortChannel interface MAC"? Also can you provide a little more detail as to how the NICs of the server really connect? As Po1 is the vPC peer link then you have two Nexus 5K and presumably you also have two FEX, with two NICs from the server connecting to both?

Also what is the relevance of the show mac address-table with MAC 0026.b931.32f9 on eth100/1/12 and MAC 0026.b931.32fb on Po 1? As they are different MACs assigned to different interfaces then it's quite possible for them to be seen on different ports. If the MAC 0026.b931.32fb is used by a NIC which is connected to a FEX that is then connected to the second Nexus 5K, then the output from the show mac address-table command is correct.

Can you provide some details of the IP address to MAC mapping on the server? What is the server OS? If it's Linux can you post the output of ifconfig and cat /proc/net/bonding/bond0 (or equivalent)?

I'm also puzzled by the fact that the packet capture is showing "duplicate use" of the IP address. Unless I'm missing something here the MAC associated with 172.16.53.16 is always Dell_31:32:f9 and MAC associated with 172.16.53.17 is always Dell_28:0b:a8.

If you can post show vpc, show fex and show port-channel summary from both Nexus 5K that would help to understand how things are connected.

The other point is what I mentioned earlier in terms of a port-channel from the server on the right seemingly connecting to two different Catalyst 3560. Can you post the server interface ifconfig and bonding configuration information for that server also, and a show etherchannel from both Catalyst switches.

Sorry there are so many questions, but as you've condensed what I believe is two Nexus 2K, Nexus 5K, two Catalyst 6500 and two Catalyst 3560 down to a diagram of two switches shown above, it's very difficult to picture exactly what you've got setup here.

Regards

Steve hi,

So far investigating this MAC addresses I founded that somehow bond interface which exist on the server on left failed and started using two MAC addresses of interfaces in that bond. Basically we have there MS Windows 2003 Server box, and Broadcom software to do bonding. We bonded two interfaces which are connected to two FEXs, which are connected to two Nexus 5K.

Seems that bond failed and interfaces started sending ARP using two MACs! So to the server on the right sometimes came MAC from 1st NIC and then MAC from 2nd NIC.

Let's make this thread stalled for now, I need to investigate why bond failed on that server.

But thanks a lot because u gave me a hint pointing me this MACs.

BR, Jacek

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card