BGP convergence time

Unanswered Question
Feb 8th, 2009

Hello,

Recently I have been experiencing some problems with BGP convergence taking longer then expected.

I have 3 upstream peers, and for awhile I had the best one local pref'ed down, so that things were more balanced.

I took away those local preference configurations, so that my ibgp routers (6) pretty much learn 250,000 routes from the 1 peer, and a very small amount of routes from the other 2, the 1 peer is just that well connected.

If i soft shut down the peer that has all of the routes as the best path, convergence cane take from 3-4 minutes, and causes some destinations to be unreachable during those times. Prior to the change, when my routes were balanced because of filtering, where I may learn 100,000 routes from each peer etc, things converged much quicker.

My question is, can Cisco routes just not handle pulling out 250,000 routes from 5-6 downstream ibgp peers quickly?

I am running 6500s and 7600s with sup720, so I would think the reason is not bad hardware that can not handle it, i would just expect these routers to be able to handle full tables a bit better, when it comes to sending the remove messages to its downstream ibgp peers faster.

Is there a way to make that go smoother, with maybe route reflectors? I have no used them before so am not to familar with how they work low level, just looking for ideas.

Thanks.

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 4 (1 ratings)
Loading.
Giuseppe Larosa Mon, 02/09/2009 - 12:54

Hello Jason,

>> If i soft shut down the peer that has all of the routes as the best path, convergence cane take from 3-4 minutes, and causes some destinations to be unreachable during those times. Prior to the change, when my routes were balanced because of filtering, where I may learn 100,000 routes from each peer etc, things converged much quicker.

It takes times to install the new routes: now when the preferred peer for 250,000 routes fail all 250,000 routes have to be withdrawn for each prefix a new BGP best path has to be chosen and propagated.

Each BGP update takes space in a BGP update packet and a BGP update packet has a maximum size.

The protocol has its own dynamics and cannot convergence in zero seconds: the more best BGP paths change the more the protocol has to work.

So what you see is the result of having preferred all the routes of a single peer: this can be a resonable choice but it is a worse case from the reduncancy point of view in comparison to having as best routes one third of prefixes via peer1, oner third via peer2, one third via peer3.

BGP Route Reflector servers are a good tool for scalability, they are of limited help in reducing convergence time.

Hope to help

Giuseppe

Jasonch518_2 Mon, 02/09/2009 - 13:17

Giuseppe,

Thanks for the reply, I was guessing that was the case and was normal operation, it just seemed odd.

Would you agree that when a peer is soft shut down, no matter how long it takes to converge, traffic should slowly switch over to the new best path, but there should not be any routing loops, in normal operation?

Reason I ask is, when I do switch over all 250k+ routes, I see routing loops for some time within my iBGP mesh, and I can not figure out if it is just because the updates are not being sent fast enough to all of the mesh because there are so many being sent, or if it is another problem all together. I am thinking the case is with there being so many updates, because when I have a peer who is only injecting roughly 2000 routes, turning that peer down and watching a visual trace to a destination that was going out that peer pretty much flips over the the new best patch instantly, because there is less work to do, so all routers in the ibgp mesh can be updated quickly.

Thanks for any ideas.

Giuseppe Larosa Mon, 02/09/2009 - 13:48

Hello Jason,

knowing exactly the topology would help. However I think that is possible that some temporary routing loop can form for the time it takes to propagate withdrawl of old BGP best path and propagation of new one.

The method of the traceroute you have used makes evident the temporary loop and it is something I never did.

You may focus on a specific prefix and use

debug ip routing with an ACL specifying only that specific prefix (verify if cpu resources are available otherwise skip this suggestion !)

In modern IOS BGP build automatically update groups for all the neighbors that share the same outbound policy.

you can check them with

sh ip bgp update-groups.

if this command is accepted and show an output peer-groups (that you may be already use) are not a way to get better performance.

Hope to help

Giuseppe

Actions

This Discussion