Attention: The Community will be in read-only mode on 12/14/2017 from 12:00 am pacific to 11:30 am.
During this time you will only be able to see content. Other interactions such as posting, replying to questions, or marking content as helpful will be disabled for few hours.
We apologize for the inconvenience while we perform important updates to the Community.
I thought that the default timers for bgp were 60 and holddown was 180. I may be wrong, but shouldn't a route that falls out of the table be at most put back into the table after 180 seconds (3 min.)?
We tested our failover this weekend, and we shut down our main router to watch our block roll over to our backup router. We lost two packets. I peer with the provider using the same AS on my end (both of my routers are using bgp 1 for instance, and I peer with bgp 2). I'm wondering if this is the reason the failover happened so quickly?
Solved! Go to Solution.
>> but shouldn't a route that falls out of the table be at most put back into the table after 180 seconds (3 min.)?
This is BGP not RIP, there is no holddown timer here.
Hope to help
Can you explain what the timers option is for on the neighbor statement? Also, when I do a "sh ip bgp neighbor x.x.x.x" I get holddown timers:
BGP neighbor is 220.127.116.11, remote AS 12, external link
BGP version 4, remote router ID 18.104.22.168
BGP state = Established, up for 00:01:08
Last read 00:00:08, hold time is 180, keepalive interval is 60 seconds
Route refresh: advertised and received(old & new)
Address family IPv4 Unicast: advertised and received
What is it used for, and is there another way that we can keep it from failing over so quickly?
since you shut down the router the BGP peer immediately broke causing the convergence. The 60 second keepalives and 180 hold time were not involved in this process since your interface went down. To test those timers you will need to maintain the interface up but not allow the keepalives to get to the peer router. You can use IP event dampening to prevent flapping interfaces from causing multiple convergences, but not aware of any user-defined parameters that will delay the convergence from happening.
since you shut down the router the BGP peer immediately broke causing the convergence.
How does the neighboring router know the interface went down without using the keepalives?
Interface goes down => routes via this interfaces are withdrawn => BGP session is teared down.
On the other hand, an ACL blocking TCP 179, would take a lot longer to be detected, about 3min as you expected.
I guess my main question is why my route failed over so quickly. If the interface goes down, how can I control the convergence time or is this impossible?
I'm really not grasping the concept of having hold timers, but they're only queried if there's an access-list blocking the port. I would think that if the peer missed a hello packet, be it blocked or a down peer, the neighboring router should still send two more hellos before it flips to the other route, meaning 3 minutes by default.
Sam has explained the probable reason for what you see.
Have you configured ebgp fast external fallover or its successor neigh x.x.x.x fall-over ?
Usually people complain of the slowness of failover when it relies on default timers.
You can see the timers as used to detect indirect failures like provider's staff putting in shut the session.
Reaction to link failure takes the time of interface link failure detection that depends on the technology in use:
for example if the link is a direct serial link and the provider router is also the DCE at OSI layer1 after shutting down the interface the other side goes down/down.
Another example is POS that be as less as 50 msecs.
Hope to help
AH! Now, if I'm peering with my ISP and I try to negate it on my end, can it be done on one of spoke routers or does it have to be done on the multihomed router?
Thank you for the compliment on "good post." =)
removing it from your end should be enough to see session taking longer to tear down (never seen this being a requirement...but I can see why one would want that :-)
The other things about timers, is that they are negotiated and lowest wins. so if you peering router is using default you can only benefit from longer hold time if you agree with peers to match urs or exceed them.
I am not entirely sure, but I recall seeing a new feature which stops this. it is used as a security feature to protect attackers to meltdown your CPU by reducing timers and therefore increasing BGP scans.
I found the command on Cisco's site, so now I have to ask:
How does the fast-external-fallover know that the peer went down if it's not using hello packets? Does it just see the route fall from the table, perform some kind of soft reconfig, and then fallover to the other peer?
Many thanks for the rating !
It's link status related,
The bgp fast-external-fallover command is used to disable or enable fast external fallover for BGP peering sessions with directly connected external peers. The session is immediately reset if link goes down. Only directly connected peering sessions are supported.
If BGP fast external fallover is disabled, the BGP routing process will wait until the default hold timer expires (3 keepalives) to reset the peering session. BGP fast external fallover can also be configured on a per-interface basis using the ip bgp fast-external-fallover interface configuration command.
So, if I wanted for the peer to wait for four hours before rolling my block over, I would need to disable fast-external-failover, and then set my timers to 4800 14400 and have the provider do the same? Or should I leave my default keepalives at 60, and then set my holdtime for 14400?
with 4800 14400, u need to miss 3 hellos.
with 60 14400. u need to miss 240 hellos.
I would keep it simple an stick to x3 ratio, ie 1st line.
So, is this a layer 2 functionality, or does it track it at layer 3? It doesn't seem like it's using hello packets to detect the down peer.
Thanks Sam and Giuseppe for clearing this up for me! I'm definitely going to play with this in gns and see what I come up with. =)
Hi, apologize for posting back to this slightly old thread. After reading this thread and another one my understanding is that the BGP timers will not take effect if fast-external-fallover is configured. Would that mean that for whatever reason if fast external fallover is configured but if the keepalives are missed the BGP session will still remain up because the interface hasn't gone down. I have never seen that hence I just want to confirm if that was what was being implied. Pls confirm this. Thx for your help.
>> but if the keepalives are missed the BGP session will still remain up because the interface hasn't gone down
No, it means that the faster process has the right to change the BGP session state:
if you use very aggressive timers like 1 second for keepalive and 3 seconds for hold it is likely that timers expire before the link is detected down (unless it is a sonet/SDH POS).
timers expiration is a sufficient reason to turn down a BGP session and the device will send a BGP notification to the peer.
if the link is healthy they are able to setup the BGP session again quickly otherwise the BGP session stays down then the link fails.
With default BGP timers the order of events is the opposite:
it is highly likely that link failure detection happens before timers expiration.
In this case bgp fast-external fallover says: the link is down so let's turn down the eBGP session with the peer that is out this link.
Hope to help
Thx for confirming that. Could you please take a stab at another post i I had in the Lan Switching section. It has to do with using NSF through a firewall for BGP. Thx