Hi, Attached is a diagram of my setup. I have done a few debugs and determined that my eBGP neighbor between R1 and R3 was getting reset when doing a Supervisor failover on R1.
The NSF restart-timer expired causing the Reset. In the same debug output it seemed that the BGP neighbor was forming approx 35 seconds after the reset hence I tweaked my NSF timers to 40 for restart-timer and 120 for stalepath timer. Here are the debug outputs. Pls advise that ip addresses have been changed but this output is from R2.
The debug output per my understanding seems to state that even though the NSF restart-time has not expired NSF still takes down the eBGP session as per the 2nd line in the output below.
Should the BGP peer still go down? My understanding was that it should not go down but wait till the restart-timer expires. I see the BGP neighbor session go in Idle/Active state when this happens.
Next is why does the route 10.1.2.1/32 get withdrawn. This is a physical loopback interface on R1 so does this get withdrawn because the supervisor on R1 failed over. Shouldn't NSF hold this route until the restart-time expires. As can be see in the output the restart-time never expires. The BGP neighbor reestablishes within 32 seconds. Somehow I calculated that it takes approx 30 seconds for the supervisor to failover. That does seem long but that might be my calculation as I based it on the output of the logs when the sup was failing so it could be incorrect.
Also what seems to be happening per the debug is that 10.10.100.2 which is the loopback of R4 and the iBGP neighbor to R3 seems to be installing this route. I am not sure why because R4 is getting the route via R2 however R2 is getting it via R1 and R2 should have lost the route when R1's sup failed thus causing the iBGP between R1 and R2 to fail.
Can someone please help to determine why NSF is not holding up the peering session and in turn holding the route that it has learnt via that peering session. Thx
SSO/NSF is based on a separation between your control-plane and forwarding plane.
Your BGP session will get down because the TCP session is reset. This is how the NSF peers noticed a switchover happened and so will not flush their BGP routes.
A new BGP session is established and the peers will send all their routes to the new active SUP. It will compute its best path and send back its own updates.
During all this process, the LCs still have the previous BGP routes before the switchover so there is no traffic lost (no change on the forwarding plane). That's why there is no rush for the control-plane to converge and so you should not tweak the timers.
What you see from the BGP NSF debugs are expected.
10.1.2.1/32 should be marked as stale on R2 and R3 BGP table and should not be removed from the RIB during the GR process. R2 shouldn't have an update for this prefix from R4 because of the AS-PATH check (R2 will see its own AS in the AS-PATH)
1. Per the debugs 10.1.2.1/32 does get withdrawn. What could be the reason for this?
2. Should fast failover be disabled in this instance since it flaps the neighbor as soon as the interface goes down when the sup is failed over.
3. Recommendations on various documents on the web seem to indicate that the NSF restart timer should be less than the BGP hold timer. What if any is the relationship between the BGP Keepalive/Hold Timer and the BGP Gracefule Restart/Stalepath Timer?
1- I don't know what's happening with this prefix. I also don't understand how R2 can try to install a route pointing to R4.
2- During SSO switchover, interfaces stay UP to allow NSF
3- The restart timer is used by the BGP peer to not wait too long for the recovery. It starts when the peer notice the BGP TCP session is down and if the peer didn't receive any new open message before the timer expired, the peer ends the GR process and flush all the routes.
So to be sure we give a chance to the GR process, this timer must be smaller than the hold timer.
The Stalepath timer starts on the peer when the BGP session is UP again. BGP updates are exchanged and routes refreshed. When this timer expires, all routes which are still in the stale state are flushed.
1. I don't think R2 is installing the prefix from R4. The prefix is on R1 and there is an iBGP neighbor between R1 and R2. I am not quite sure if I said something or if it is something in the debugs that I might have missed that eluded to this. Pls confirm why you think R2 is installing the route from R4.
3. If there is a SSO switchover than does the BGP hold timer still apply in this case. I mean isn't the neighbor relationship torn down anyway since there was a SSO failure however it just maintains the forwarding for that prefix due to NSF. So I am not quite sure why the hold time even applies. The timers are exchanged during sesssion establishment anyway hence if a new session is established how does the hold time matter for NSF. Appreciate if you can clarify this.
We are pleased to announce availability of Beta software for 16.6.3. 16.6.3 will be the second rebuild on the 16.6 release train targeted towards Catalyst 9500/9400/9300/3850/3650 switching platforms. We are looking for early feedback from custome...