BGP multiple site failover

Unanswered Question
Oct 9th, 2009


I have one site with two different ISP links, using iBGP to provide both failover functionality and distribute traffic load between the two. We have 5 class C addresses being advertised as 5 /24 networks to each ISP. One of those class C's is used for network equipment in the border network.

Now we are setting up a disaster recovery site, and what the system architect wants to to is split one of the class C's (the one currently used for border network equipment), using one half for border network equipment on the primary site, and the other half for border network equipment at the recovery site. The architect also wants to advertise the other class C's at the recovery site, but at a cost high enough that little or no traffic will go there. We have a setup similar to this at our current primary site for cost savings purposes.

At our primary site, when one ISP becomes disconnected, traffic fails over to the other ISP right away, and as I see it, that's because each router connected to each ISP is set up for iBGP. When one border router loses connectivity, the other picks up right away. Also the router connected to the 'chief' ISP (the ISP with the most traffic load) is the active in the HSRP group.

Currently, the lesser ISP router has nearly all of its routes pointing to the greater ISP router, since incoming route advertisement from the lesser ISP have additional hops (AS #) added.

My understanding is that when the greater ISP router loses connectivity, iBGP takes care up updating the routing table in both routers so that traffic flow out the ISP link which is still up, and that HSRP doesn't have anything to do with updating any routing tables. Is that right?

The reason I am posting this is that I need to explain that if a recovery site has the same class C networks, but at a cost so high as to be a failover site, if failover does happen, it will not be as fast as two routers directly connected using iBGP. As far as I have been able to tell, it could take 15 - 20 minutes for the recovery site to show up as the "best" bgp route throughout the Internet. Is that accurate? Any thoughts are quite appreciated.



I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
jbiederstedt Fri, 10/09/2009 - 13:44

Posting diagrams of current and with recovery site to clarify things.

Again, thanks for any and all input


Giuseppe Larosa Sat, 10/10/2009 - 01:58

Hello John,

after having looked at the network diagrams you have provided some notes:

if the new remote disaster recovery site is not connected in any way to central site it may be wise to use the following measures:

BGP router of remote site can use a combination of:

neighbor ispb allowas-in to learn the BGP routes originated at central site;

to use this knowledge as a condition to checked for when to start advertise the same prefixes to ISPB.


BGP conditional advertising

About convergence time: it is roughly the time it takes ISPB to install the new routes from remote site router.

HSRP doesn't play any role in BGP convergence, actually it is not a routing protocol at all but a First Hop redundancy feature that provides a virtual default gateway.

Final note:

Disaster Recovery depending on applications and servers may require to wait before starting to work on the DR site: not for routing purposes but for application consistency you may be requested to start manually BGP advertisements on DR site after all servers are ready to go and this can take time.

Hope to help


CriscoSystems Fri, 10/09/2009 - 15:00

How is external traffic routed within your class C networks? Do RTR1 and RTR2 have any other IBGP neighbors, are you advertising a static route from each or either of them?

HSRP shouldn't be affecting any routing tables; as long as only the virtual address is being advertised (which is kinda the whole point of HSRP).

Is HSRP active only on your internal network; or are the ISP's edge routers also peering to a virtual address?


This Discussion