My customer is seeing frequent neighbor flap .
ADJCHANGE: neighbor x.x.x.x Down BGP Notification received
.Jun 2 14:07:56.705 BST: %BGP-5-ADJCHANGE: neighbor x.x.x.x Up
While troubleshooting the issue I have found that the BGP time is set in one
side 5 (keepalive ) 10 (Holdtime ) and
on another side 5 (Keepalive ) 15 (Holdtime). Also on other routers I see
the keepalive & holdtime are set 5 & 10-15 sec .
I have managed to reproduce the problem in GNS3 . And I see the following
router bgp 1300
timers bgp 5 10
neighbor 10.13.1.1 remote-as 1200
*Mar 1 00:10:02.107: %BGP-3-NOTIFICATION: received from neighbor 10.13.1.1 4/0
(hold time expired) 0 bytes
*Mar 1 00:10:02.115: %BGP-5-ADJCHANGE: neighbor 10.13.1.1 Down BGP Notification
*Mar 1 00:10:26.315: %BGP-5-ADJCHANGE: neighbor 10.13.1.1 Up
R0#sh runn | sec bgp
router bgp 1200
timers bgp 5 15
neighbor 10.12.1.2 remote-as 1200
neighbor 10.13.1.2 remote-as 1300
*Mar 1 00:40:31.483: %BGP-5-ADJCHANGE: neighbor 10.13.1.2 Down BGP Notification
*Mar 1 00:40:31.483: %BGP-3-NOTIFICATION: sent to neighbor 10.13.1.2 4/0 (hold
time expired) 0 bytes
*Mar 1 00:40:59.207: %BGP-5-ADJCHANGE: neighbor 10.13.1.2 Up
So can you please suggest whether the BGP timer is not set correctly and thats
the cause of BGP neighbor flap ?
Note : My customer uses 7206VXR router and the IOS version : 12.4(15)T8
BGP timers need not to be the same. They are negotiated in early stage of session set up and would settle for lower values of 2 neighbors...so this is not the cause.
Check the intefaces for errors or resets, as this is a typical reason for TCP session to fail or flap.
Thanks for the reply . I see there is one Serial interface resets about 2021 .
But I see the BGP Adj changes happen due to holddowin timers being expired . So If the BGP timer is not the cause of the problem then how can we go further ?
using a ratio 3:1 between keepalive and hold-time is recommended so that missing two consecutive BGP keepalives is requested for the neighbor to send the BGP notification message.
With a ratio 2:1 missing one BGP keepalive can be enough to trigger the BGP notification.
Also check the platforms type and link usage: there are chances that a full used link without QoS protection of BGP packets can drop them in saturation.
For example some platforms have an hidden system queue for routing protocol (2600,3600 and ISR routers) traffic but other ones like C7500 and GSR haven't it.
I see you have 7206 VXR I would deploy a QoS policy to protect BGP packets on the link.
Hope to help
Hello Giuseppe :
I do see one of the links has 2022 numbers of resets . Can also that be a problem of BGP Adg change ?
check if the number of interface resets grows over time
if so of course is a sign of OSI Layer1 problems, however I would use holdtime 15 seconds on both sides as also suggested by Istvan for all the reasons we have explained.
you may need to contact your WAN provider if the link has troubles to perform loop tests.
Later you can fix the BGP configuration.
With this configuration the following timers are negotiated:
keepalive: 5 sec
holdtime: 10 sec
This means the bgp neighbor relationship will be reset if 2 keepalives aren't received by the neighbor.
In this scenario, it can happen that one keepalive packet is lost and the next keepalive arrives a bit late, which causes the neighbor relationship to be reset.
In general it is suggested for the holdtime to be at least 3 times the keepalive timer, in this case 15 sec.
I would suggest to set the holdtime to 15 sec on both BGP routers, so the negotiated values will be 5 sec and 15 sec respectively.
I noticed Giuseppe also responded. His suggestion is great to protect bgp router traffic by configuring QoS.