HA Datacenter internal routing: BGP Timers?

Unanswered Question

I'm doing some testing in my lab with respect to some high-availability things we are doing at work. In accordance with best practices, I have a 3-tier design comprised of:

Core (c3550-12g)

Aggregation (2x 3550-48 (Agg01 & Agg02))

Access (2x 2950)


Everything is connected using Rapid-PVST+, and everything is linked in the traditional triangle style.


Core and Aggregation are running OSPF on loopbacks, while end-user routes are being distributed via iBGP (for scalability). Additionally, the two aggregation switches are running HSRP for a single vlan.


All of my failovers work as expected, including STP, HSRP and OSPF route changes.


My problem is the route convergence times using iBGP. When Agg01 goes completely offline, the Core pulls the OSPF routes quickly, but the iBGP routes stick around for ~3 minutes, blackholing the traffic in the meantime as Agg01 is not available. Eventually the hold timers expire, and the Core gets the new routes from Agg02. Unfortunately, 3 minutes is a really long time.


Does anybody have any experience with iBGP hold timers in such an environment? I want to set them very low, such as 5-10seconds, but I am concerned about very high CPU utilization.


Is modifying BGP hold timers the right answer? Am I missing something entirely?


Thanks for your thoughts,

Randal

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
sachinraja Tue, 05/22/2007 - 07:19
User Badges:
  • Red, 2250 points or more

Hello randal


can u pls give us more info on ur network ? Why is IBGP used, at the first place ? where does your EBGP terminate ??


I will definitely not tweak the BGP timers and just leave them to default. as u said, this could adversely affect the memory or cpu of the router since there will be a scan r refresh every ten seconds...


Raj

Please find the attached diagram.


I am using iBGP because it scales a lot better than OSPF when it comes to large numbers of customer routes.


Our eBGP connections are not involved in this lab setup; no external routes will be propogated, only defaults for infrastructure.


The issue is that when Agg01 is not able to reach Core anymore, it takes Core ages to pull the routes due to BGP Hold timers. Two cable cuts or something can blow a 99.999% uptime with that slow of a convergence.


One solution is to use OSPF to carry all of the customer routes, but I am concerned about performance when it comes to 3000+ routes.


Thus my question about tuning BGP timers for iBGP connections only.


Thanks,

Randal



Actions

This Discussion