Hi, Attached is a diagram of my setup. I have done a few debugs and determined that my eBGP neighbor between R1 and R3 was getting reset when doing a Supervisor failover on R1.
The NSF restart-timer expired causing the Reset. In the same debug output it seemed that the BGP neighbor was forming approx 35 seconds after the reset hence I tweaked my NSF timers to 40 for restart-timer and 120 for stalepath timer. Here are the debug outputs. Pls advise that ip addresses have been changed but this output is from R2.
The debug output per my understanding seems to state that even though the NSF restart-time has not expired NSF still takes down the eBGP session as per the 2nd line in the output below.
Should the BGP peer still go down? My understanding was that it should not go down but wait till the restart-timer expires. I see the BGP neighbor session go in Idle/Active state when this happens.
Next is why does the route 10.1.2.1/32 get withdrawn. This is a physical loopback interface on R1 so does this get withdrawn because the supervisor on R1 failed over. Shouldn't NSF hold this route until the restart-time expires. As can be see in the output the restart-time never expires. The BGP neighbor reestablishes within 32 seconds. Somehow I calculated that it takes approx 30 seconds for the supervisor to failover. That does seem long but that might be my calculation as I based it on the output of the logs when the sup was failing so it could be incorrect.
Also what seems to be happening per the debug is that 10.10.100.2 which is the loopback of R4 and the iBGP neighbor to R3 seems to be installing this route. I am not sure why because R4 is getting the route via R2 however R2 is getting it via R1 and R2 should have lost the route when R1's sup failed thus causing the iBGP between R1 and R2 to fail.
Can someone please help to determine why NSF is not holding up the peering session and in turn holding the route that it has learnt via that peering session. Thx