cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
7228
Views
4
Helpful
12
Replies

BGP timers failover time question

blackladyJR
Level 1
Level 1

Hello, when we receive an bgp updates from neighbor that a route has been removed for example, how long will this router to accept this update to remove that route based on the updates in its own routing table?

does it have anything to do with the BGP timers hold time at all? I assume not, I assume the default 60/180 timers is for the BGP neighbor peering session timers for the peer to establish or tear down that it has nothing to do with the actual route updates, is this correct?

If so, then say I have this situation:

A (L3 switch) -- B (CE) -- C (PE) -- D (PE) -- E (CE).

A and B is eBGP.

B and C is eBGP.

C and D is MP-iBGP.

D and E is eBGP.

if I power down A, then B will see its neighbor goes down and loses all the routes learned from A.

Questions:

1. Does B wait for 180 seconds before decide neighbor A is dead?

2. Does B wait for 180 seconds before it removes A's routes from its own routing table?

3. Once B removes A's routes from its own routing table, how long will B advertise this update to C? Is it immediate?

4. Once C receives this update from B, how long will C to wait before removing A's route from its own routing table?

5. Once C removes A's routes from its own routing table, is it immediate that C will advertise this update to D?

6. So same question on D and E. The bottom line is when A power down, how long will E withdraw A's routes from its own routing table going thru B, C, D to E? Does BGP timers 60/180 play any role in the route updates?

thanks.

1 Accepted Solution

Accepted Solutions

Joyce / John

If B realises that A has gone down, either using the timers or some other way then yes B can remove the routes from it's routing table and stop advertising them to downstream neighbors.

The original question was about the BGP timers ie. do they affect how long a route is kept in the routing table. And the answer is still no, not directly ie.

the BGP timers are purely used to detect a dead neighbor ie. they are not equivalent of holddown timers found in some IGPs.

So the timers only affect affect how long B keeps A's routes and only indirectly ie. once 180 seconds has passed ie. 3 missed hellos the routes are removed because B now realises A is down. On C D & E the timers are irrelevant in this scenario.

Obviously if there is another way for B to realise A is down ie. fast-external-failover then the timers are not only not relevant on C D & E but also on B.

Jon

View solution in original post

12 Replies 12

Jon Marshall
Hall of Fame
Hall of Fame

Joyce

1) Yes

2) No, once B realises A is down the routes received from A are removed.

3) Immediate

4) Immediate

5) etc...

ie. the timers are for detecting when a neighbor has gone down and not for how long to hold the routes in the table. Once the route has been removed from B then it will not be advertised to C and so C will not advertise it to D and D will not advertise to E.

Jon

Jon,

I had this same discussion with Giuseppe and Sam yesterday. It seems as though the default action of "bgp fast-external-failover" causes the issue, and this can be negated. I tested this in GNS, and while it helped, it didn't wait 3 minutes before going to my backup router.

I have a couple of questions with what you wrote:

1. Does B wait for 180 seconds before decide neighbor A is dead?

Yes

I always thought that, but from what I had seen it doesn't wait to update the routing table

2. Does B wait for 180 seconds before it removes A's routes from its own routing table?

No, once B realises A is down the routes received from A are removed.

But you said above that it does wait 180 seconds

So what I get from this is that the hellos are still going, but the routing table is updated immediately by removing the route and then propagating that change to other neighbors? That's the part that confuses me.

Here's the post if you haven't already read it from yesterday:

http://forum.cisco.com/eforum/servlet/NetProf?page=netprof&forum=Network%20Infrastructure&topic=WAN%2C%20Routing%20and%20Switching&topicID=.ee71a06&fromOutline=&CommCmd=MB%3Fcmd%3Ddisplay_location%26location%3D.2cd40176

Thanks,

John

HTH, John *** Please rate all useful posts ***

John

I'll have a read of the post when i get the chance.

"1. Does B wait for 180 seconds before decide neighbor A is dead?

Yes

I always thought that, but from what I had seen it doesn't wait to update the routing table"

Not sure i fully understand. Until B realises A is down it won't remove the routes from the routing table. If A could notify B it was going down then B would be able to but by powering off A the only way B realises A has gone down is once it misses 3 hellos.

So it has to wait to update it's routing table. Note that none of the other routers do though. That's the key point. A is powered off but all the other routers simply update their routing tables as soon as B has told them or to be more specific -

B waits 180 seconds and because it hasn't heard a hello from A in that time it then removes all routes received from A. As soon as it removes them it stops advertising to C so C immediately removes them and stops advertising to D etc..

"2. Does B wait for 180 seconds before it removes A's routes from its own routing table?

No, once B realises A is down the routes received from A are removed.

But you said above that it does wait 180 seconds"

No I didn't or if i did i shouldn't have :-)

B waits 180 seconds to realise A is down, but as soon as it knows A is down it removes the routes immediately as described above.

The timers are to allow a router to know when a EBGP peer has gone down. stopped funtioning. They are nothing to do with how long routes are held in the routing table, or at least not directly. To see this

In GNS3

1) create 3 EBGP peers A -> B -> C

2) advertise a network from A

3) power down A and run "sh ip bgp summary" on B and "sh ip route" on B & C

What you will see is B still thinks A is up. After approx 180 seconds B will declare A down

"sh ip bgp summary" on B will show no neighborship

"sh ip route" on B will not show the route from A

If you run "sh ip route" on C as soon as the route disappears from B you will see it has disappeared from C as well.

Jon

I wrote a reply and not sure why it didn't show up.

Anyway, after talking with more, the conclusion is if B sees the interface facing A goes down also, then the fast-external-fallover will be used by default and will skip the timer. So B will lose the route immediately and will also advertise to C immediately. So when this happens, E will probably lose the routes in under 20 second total.

But if A powers down but B's LAN interface facing A is UP UP, then in this case it will take the timers for B to realize A is dead so it will take 180s. So E will take a total of 180+20 approx to lose the route.

I will try this in GNS3 later and post the result. For second scenario, I will just admin down the neighbor in A so to keep the interface to be UP UP. In first scenario, I will admin down the Interface on A and check to see if B's LAN interace goes down or not.

thanks,

Joyce

Joyce,

Yes, this is what I experienced the other day, and I was PULLING MY HAIR OUT! :) The fast-external-failover ignores the timers, so if you have a point-to-point link go down, then your side will immediately see it as down/down and will update the routing table then. I believe the only way to get around this, and it will affect you globally is "no bgp fast-external-failover." This will take the timers into consideration.

HTH,

John

HTH, John *** Please rate all useful posts ***

Joyce / John

If B realises that A has gone down, either using the timers or some other way then yes B can remove the routes from it's routing table and stop advertising them to downstream neighbors.

The original question was about the BGP timers ie. do they affect how long a route is kept in the routing table. And the answer is still no, not directly ie.

the BGP timers are purely used to detect a dead neighbor ie. they are not equivalent of holddown timers found in some IGPs.

So the timers only affect affect how long B keeps A's routes and only indirectly ie. once 180 seconds has passed ie. 3 missed hellos the routes are removed because B now realises A is down. On C D & E the timers are irrelevant in this scenario.

Obviously if there is another way for B to realise A is down ie. fast-external-failover then the timers are not only not relevant on C D & E but also on B.

Jon

Jon,

In GNS3

1) create 3 EBGP peers A -> B -> C

2) advertise a network from A

3) power down A and run "sh ip bgp summary" on B and "sh ip route" on B & C

What you will see is B still thinks A is up. After approx 180 seconds B will declare A down

This is not what I had seen though. What I seen, in GNS and real world (which is what prompted me to play with it in GNS), was the following:

A:

int fa0/0

ip address 192.168.1.1

int fa0/1

ip address 10.10.10.1

router bgp 2

network 10.10.10.0 mask 255.255.255.0

neighbor 192.168.1.2 remote-as 1

B:

int fa0/0

ip address 192.168.1.2

int fa1/0

ip address 192.168.3.1

int fa0/1

ip address 192.168.2.1

router bgp 1

network 192.168.2.0

network 10.10.10.0 mask 255.255.255.0

neighbor 192.168.1.1 remote-as 2

neighbor 192.168.3.2 remote-as 2

C:

int fa0/0

ip address 192.168.3.2

int fa0/1

ip address 10.10.10.1

router bgp 2

network 10.10.10.0 mask 255.255.255.0

neighbor 192.168.3.1 remote-as 1

I had another router behind router B set up as a host pinging 10.10.10.1. When I shut down router A, it was best/valid route on Router B. I lost one packet and the bgp table showed Router C as best/valid. It didn't wait 180 seconds before failing over.

The above is a very quick scheme and typos could exist =) (typed it outta my head.)

Thanks,

John

HTH, John *** Please rate all useful posts ***

John

That's a different test.

Note i said "create 3 EBGP peers A -> B -> C"

but your test hasn't ie. A & C are in the same AS so you have A & C using IBGP and A & B, B & C using EBGP.

B already knows about 2 paths to 10.10.10.1 ie. via A and via C. I understand what you are saying about the timers though so i'll need to test this scenario to see what's going on.

The way i tested in GNS3 by the way was

A -> B -> C where each router was in a different AS.

Edit - actually i should have said "Create 3 different EBGP peerings" :-)

Jon

John

Update on this after testing.

R0 & R1 are in AS 10.

R2 is in AS 11

R3 is acting as a client ie. "no ip routing"

R0

int fa0/0

ip address 192.168.2.1 255.255.255.252

int fa0/1

ip address 10.10.10.1 255.255.255.0

router bgp 10

network 10.10.10.0 mask 255.255.255.0

neighbor 192.168.2.2 remote-as 11

timers bgp 10 30

R1

int fa0/0

ip address 192.168.3.2 255.255.255.252

int fa0/1

ip address 10.10.10.1 255.255.255.0

router bgp 10

network 10.10.10.0 mask 255.255.255.0

neighbor 192.168.3.1 remote-as 11

timers bgp 10 30

R2

int fa0/0

ip address 192.168.2.2 255.255.255.252

int fa0/1

ip address 192.168.3.1 255.255.255.252

int fa1/0

ip address 172.16.5.1 255.255.255.0

router bgp 11

network 172.16.5.0 mask 255.255.255.0

neighbor 192.168.2.1 remote-as 10

neighbor 192.168.3.2 remote-as 10

timers bgp 10 30

R3

no ip routing

int fa0/0

ip address 172.16.5.2 255.255.255.0

ip default-gateway 172.16.5.1

So R0 was chosen by R2 as the way to get to 10.10.10.1. Note that a "sh ip bgp 10.10.10.0/24" on R2 shows both paths ie. via R0 and via R1.

I then started a continuous ping from R3.

Shutdown R0 and the ping timed out. The ping continued to time out while R2 still showed R0 as up in "sh ip bgp summary". I lost a lot of packets. A "sh ip route" on R2 still showed R0 as the chosen path.

As soon as R2 reported R0 down it used the other route via R1 and the ping continued.

Note as above i used timers of 10 & 30 as i didn't want to wait 3 mins every time i shutdown one of the routers :-).

So out tests are behaving differently. What IOS are you using ?

Jon

Jon,

I retested last night as well, and you are 100% correct, but I did something a little different.

I had 3 different AS: 10, 11, 12. 10 was peering with both 11 and 12. 11 was shut down, and I ran debugs (debug ip bgp keepalive) on 10. I noticed that it would try to do the keepalive until it timed out.

Interesting thing was that you said that if I had 10, 12, and 12, 12 was using iBGP, but 12 and 12 weren't peering with each other. It was like:

12 -> 10 -> 12

and the 12 peers were not neighbors of each other.

I set that topology back up as well, and started my debugs. You are 100% correct in the fact that when it sees the router completely go down, it automatically flipped to the other 12 AS for the same lan block, but it kept sending keepalives for the router that went down. It sent three before it timed out.

Thanks,

John

HTH, John *** Please rate all useful posts ***

John

"Interesting thing was that you said that if I had 10, 12, and 12, 12 was using iBGP, but 12 and 12 weren't peering with each other. It was like:

12 -> 10 -> 12"

Yes, sorry about that. I realised after posting that you couldn't be running IBGP between the 2 routers as you were using the same IP address ie. 10.10.10.1 on both routers.

Jon

John

On re-reading i can see the confusion.

1. Does B wait for 180 seconds before decide neighbor A is dead?

Yes it does.

2. Does B wait for 180 seconds before it removes A's routes from its own routing table?

I took 2. to mean does B wait a further 180 seconds after it has realised A is down. In which case the answer is no.

Jon

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: