I have two datacenters that I am trying to run GLBP across a L2 span. There are 4 router interfaces that are participating in the same group. For some reason the hosts on that VLAN will stop responding to ping and data traffic. HSRP seems to run fine but we have standardized GLBP in our network. Is anyone out there doing something similar to this? Thanks!
Can you be more specific about the problem? Where are you pinging that hosts from? Do your pings reach the hosts? How does the "show glbp summary" output look in the moment of the problem? Is only the incoming or also the outgoing traffic affected? Please try to put us more into the picture.
The hosts are pinging each other on the same VLAN. In effect I have 4 routers in two datacenters. Each router pair is providing redundancy to each other in their respective datacenters. I threw a fundamental picture together and attached it. Computer A isn't able to ping Computer B consistantly. Thanks!
Thank your for the exhibit, it made things much easier to understand. This is a most interesting issue. Are you suggesting that if the four routers run GLBP, the connectivity between computers A and B intermittently breaks - even despite both computers being in the same VLAN and not making use of the GLBP virtual IP for reaching each other?
I wonder - how is the L2 tunnel created between your two data centers? Is it also provided by the four routers? Also, what exact types are those four routers?
My general idea here is that whatever the GLBP settings for a VLAN may be, the connectivity inside that VLAN should not be impaired in any way, as the intra-VLAN communication is not routed and should not be influenced at all by the GLBP settings. The only thing I can see right now is that the GLBP somehow influences the L2 tunnel between the data centers.
We are using four 7600 series routers tunneling L2 into L3 using VPLS. They are inline with the green line in the drawing. The routers are 6500 series switches doing L3. However the 6500's see this just as a L2 connection. If I run HSRP over the same connection without any extra HSRP features (timers, etc) it works just fine.
I am going to post a stupid question but I want to make sure: do the 7600 routers in any way refer to the IP address of the GLBP group (the 10.10.10.1)?
If we can rule out any possibility of the L2 link between your two data centers being influenced by the GLBP with absolute sincerity, then we have to focus back on the stations A and B. I would be interested in knowing these details:
1.) How exactly do the ARP caches of stations A and B look like when they have connectivity problems
2.) How does the MAC address table look like on the 7600 routers that provide the VPLS service - whether the MAC addresses of the stations A and B are correctly learnt
3.) Whether we can trace the ping going between the stations A and B and see where it gets lost
Sorry I wasn't more clear on my last response. All four routers are in the same GLBP group. All the routers can ping and communicate with each other just fine. It's the hosts on that segment having the problems. I have since stripped out the GLBP config and put just a standby IP on and put those 4 routers in the same HSRP group and it works, so I don't have any real output to show. I was just checking the waters to see if anyone was doing something similar.
Is there a firewall or other device on that segment running proxy-arp?
Do a arp -a on the host and compare the mac address of the default gateway to the mac-address of your glbp group.
Out of curiosity, why are you spanning L2 between data centers rather than L3? Hopefully, there are fully redundant links and hardware.
NonStop Networks, LLC
There are no firewalls on this segment. We are having to span L2 between datacenters for the supposed reason that VMware cannot do L3?(I am not quite convinced of this yet). Anyway we are using VPLS to accomplish this L2 span between datacenters. I have attached a basic picture of what this looks like. Thanks for the response.
The only thing the 7600's are doing is L2 extension. They are not participating in the L3 portion at all. What I need to do is get some folks around to help me test this. I just simply need to re-configure this network to run GLBP. This may be something I can get done tomorrow. I apologize for not having everything set up and ready to go, but this is a production network.
Thanks for your help!
No problem. I understand that this is a production networks and experiments like this are disruptive. We have to give a very close look on what exactly happens to the PING packets when they get lost - where and why.
I will get a chance to turn GLBP back on and run some tests. I will even try to get some packet captures as well.
Thanks again for the help!
We were able to put GLBP back on this L2 VLAN and were able to get it to work! Basically what we had to do since we have 4 members in a GLBP group we have to weight these connections. What we were seeing (once I got everyone around FW admins and Server Support)is traffic leaving one side of the VLAN and trying to exit out the other side of that VLAN. The document I found explains on how this is done.
There was also another NetPro conversation about GLBP load balancing by default without anything set. You can see that here.
Basically the hosts in each data center were seeing different mac's for the gateway and returning traffic to the wrong AVF.
you had described problems in reaching hosts in the same vlan/IP subnet that is extended in the two locations.
When trying to talk to another host in the same subnet an host shouldn't use its default gateway but just it should try to use an ARP request to resolve other host's IP address.
As noted by Peter, if trying to access hosts in other IP subnets the default gateway is contacted and an AVF mac address is received from AVG.
Also it would help to know how you have configured the L2 transport and the L3 services: usually when doing an xconnect the L2 PE is not providing L3 services to the Vlan that is object of L2 transport.
Hope to help
Sorry I wasn't clear that this was only off-net traffic that it was being affected by. Everything within the same subnet could talk just fine. Whenever a packet would come across one of the GLBP gateway interfaces it would try to forward it out one of the other GLBP members. We found this by looking at firewall logs. We would see traffic leave an interface and try to be sourced on another interface at another Firewall in another Data Center. Sorry again for the confusion!