NLB cluster looses connectivity through csico switch

Answered Question
Mar 26th, 2009

I have 2 2008 servers in a network load balancing cluster running in vmware ESX 3.5.0 which is connected to a cisco catalyst 4510.

The cluster is always reachable in the same subnet. To make it reachable from different vlans I added the static arp entry:

arp 172.17.16.90 0100.5e7f.105a ARPA

This works at first, but after an ammount of time the cluster ceases to be reachable from different vlans. I list the arp table on the 4510 and the static entry is there. I add it again and it works again for a while.

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (1 ratings)
Loading.
Yudong Wu Thu, 03/26/2009 - 21:29

What NLB mode are you running?

In your static ARP entry, IP is unicast IP but MAC is multicast MAC. If I remember correctly, it should be "multicast mode" NLB.

I don't remember how MS to implement this NLB. If server never sent packet with that MAC address as source MAC, there won't be an entry in MAC address table. So all packets to that IP will be flooded in the server vlan. So, you might need to configure a static mac entry if possible.

As for the communication with other vlans, did you notice how long it took to stop working after you add static arp entry.

andrewswanson Fri, 03/27/2009 - 06:12

yes you need a static mac entry.we had a similar problem with our MS NLB clusters - our server admins were migrating the VMs between differnet enclosures so we had to modify the mac-address-table static command to reflect all the interfaces where the mac might appear. e.g.

mac-address-table static xxxx.xxxx.xxxx vlan xx interface Po1 Po2 Po5 Po6 Po9 Po10 Po12 Po13

zirkelad Fri, 03/27/2009 - 19:08

I did add a static arp enty and mac-address-table entry for the ports the VM servers are connected to. IGMP was enabled on those ports so the command was different than above. The cluster is in multicast (IGMP) mode.

The arp entry definitely made it work for a while, the table entries didn't seem to have an effect. I can always get to the cluster with machines on the same subnet (and on different ports). Nodes on other subnets work for a while and then stop being able to connect or ping the cluster ip. Strangely when I restart a network service (imap) then the connectivity is fine for a while.

Yudong Wu Fri, 03/27/2009 - 21:44

Not sure how NLB IGMP mode is implemented.

From multicast point of view, on your 4500 switch, igmp snooping need to be enabled.

You need a igmp query on server vlan to keep those IGMP status. Therefore, you might need to configure "ip pim" under server vlan interface.

zirkelad Mon, 03/30/2009 - 08:12

It is enabled in sparse-dense mode.

right now I can get to the cluster from a machine on the same subnet. I can't get to it from other subnets. It seems like if I connect to one of the cluster machines with its ip (not the cluster ip) then the cluster ip becomes available again.

That indicates a switching problem to me, I'm using ios 12.2, should I upgrade to 12.4?

zirkelad Mon, 03/30/2009 - 09:27

It is enabled in sparse-dense mode.

right now I can get to the cluster from a machine on the same subnet. I can't get to it from other subnets. It seems like if I connect to one of the cluster machines with its ip (not the cluster ip) then the cluster ip becomes available again.

That indicates a switching problem to me, I'm using ios 12.2, should I upgrade to 12.4?

Actions

This Discussion