I'm having a big trouble trying to get this topology working. The image is not the actual topology I'm working on, because I removed some stuff so it would be easier to you to focus on the problem I'm having.
My goal is to remove any single point of failure (SPOF), so as you can see, we have two of anything. If a router/switch/server fails, there's another one. Until now, I got GLBP and Pacemaker working just fine.
Switches have the default (blank) config. Routers have their own IP and GLBP IP (on BVI1) with no additional options and servers have their own IP and Cluster IP (on bond0) with no additional options, either.
Green addresses are unique for each device.
Red (virtual) addresses are shared by the devices on their sides in order to provide fault-tolerance and/or load-balancing. Servers have .1 as gateway.
Purple addresses are used by servers to communicate/monitor each other and synchronize databases.
ackets get duplicated and/or arrive in both physical server interfaces.
Ping 10 times to SRV1:
10 ping requests are sent.
13 ping requests are received on bond0 (6 on eth1 and 7 on eth2).
10 ping replies are sent.
Yes, 10 pings were succesful with duplicated packets (some of you could think that's good enough), but when I use an upper layer protocol such as SSH, when packets arrive in both physical interfaces (eth1 and eth2), it just doesn't work. Sometimes even ping doesn't work fine. Don't know if packets being dropped or not even being received (didn't have the time to capture network traffic on that issue today).
This is my first time working with a high-availability network design, and I think this may be MAC related.
Any help would be much appreciated
[EDITED] Solution (December 5th):
According to the Linux Kernel documentation regarding bonding (Chapter 11: "Configuring Bonding for High Availability"), in this topology and with the equipment provided, isn't possible to setup fault-tolerance and load-balancing on the servers' physical interfaces, which is the default mode for bonding (balance-rr, a round-robin based mode), so the solution was to opt for active-backup mode, which sets only one interface as active and provides only fault-tolerance.
So, now I have primary and backup links, which means there's a primary switch and a backup one. If one server's primary link goes down, that would cause each server to be connected to each switch, so I connected the switches in order to avoid packets going through the routers.
I hope this saves some time for anyone having the same issue.
Re: High-availability network connectivity problem
Switches are 2950 and routers 2801. I've been reading about bonding in the past hours, and found that when using multiple switch topology with it, only active-backup or broadcast modes are valid. The default mode for bonding is balance-rr, and I didn't specify any mode so that might be the issue. Problem is, I've to wait until monday to test it.
Anyhow, I would like to know what's that about stacked 2960S or 3750, because I'm using those 2950 to build a prototype, but we are supposed to buy two 2960S later, actually.
This document gives several answers on frequently asked questions for PFRv3 channel state behavior.
Q1: What are all the channel operational states from a BR (border role) perspective and what are the rules/conditions to be in each st...
The need was to reach an host inside a LAN through a VPN connection managed by the LAN gateway (Cisco 1921).
The LAN gateway performs NAT and there was a dedicate nat rule for the host i wanted to reach through VPN.
I couldn't connect to the hos...
We have 3 identical switches configured by someone else and would like to claim some of the Gigabit ports(G1/G2/G3/G4) for use on servers. When we try to change the wiring and configuration, we run in to connectivity issues. Attached is a des...