Perhaps you would post the relevant parts of the config. I can not be sure from your description how much of that delay is due to slow recognition of the failure and how much is actual time for OSPF to converge. Perhaps we see the configuration we could figure out some of that.
In general it is certainly possible to have fast OSPF convergence. We can not give advice about your particular situation until we know more about your environment.
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
As Rick notes, would need to better understand what you're actually doing, but a very common cause of slow convergence, for the kind of topology you describe, is interface is up but adjacency is lost to OSPF neighbor. If you're running common defaults for neighbor dead times, what might be happening is OSPF is block-holing traffic until it finally determines the OSPF neighbor is down.
Does show ip os in, show values like: Hello 10, Dead 40, Wait 40, Retransmit 5?
Kindly go through this document. It will help you to calculate the timers that may help you for your faster convergence of OSPF.
Sample Fast Convergence Profile
Putting the above information together, let’s try to find an optimum convergence profile based on the fact that we have “show ip ospf statistics” output from the “weakest” router in the area.
show ip ospf statistics
OSPF Router with ID (10.4.1.1) (Process ID 1)
Area 10: SPF algorithm executed 18 times
Summary OSPF SPF statistic
SPF calculation time
Delta T Intra D-Intra Summ D-Summ Ext D-Ext Total Reason
1w3d 8 0 0 0 0 0 8 R, X
1w3d 12 0 0 0 4 0 16 R, X
1w3d 16 0 0 0 4 0 20 R, X
1w3d 8 0 0 0 0 0 8 R,
1w3d 20 0 0 0 0 0 20 R, X
1w2d 24 0 0 0 8 0 32 R, X
1w2d 8 4 0 0 0 0 12 R,
6d16h 4 0 0 0 0 4 8 R, X
6d16h 4 0 0 0 0 0 4 R,
6d16h 12 0 0 0 8 0 20 R, X
RIB manipulation time during SPF (in msec):
Delta T RIB Update RIB Delete
1w3d 4 0
1w3d 8 0
1w3d 10 0
1w3d 5 0
1w3d 8 0
1w2d 10 0
1w2d 3 0
6d16h 2 0
6d16h 1 0
6d16h 9 0
Failure Detection Delay: about 5-10ms worst case to detect/report loss of network pulses. Maximum SPF runtime: 32ms, doubling for safety makes it 64ms Maximum RIB update: 10ms, doubling for safety makes it 20ms OSPF interface flood pacing timer: 5ms (does not apply to the initial LSA flooded)
LSA Generation Initial Delay: 10ms (enough to detect multiple link failures resulting from SRLG failure) SPF Initial Delay: 10ms (enough to hold SPF to allow two consecutive LSAs to be flooded) Network geographical size: 100 miles (signal propagation is negligible) Network physical media: 1 Gbps links (serialization delay is negligible)
Estimated network convergence time in response to initial event: 32*2 + 10*2 + 10 + 10 = 40+64 = 100ms. This estimation does not precisely account for FIB update time, but we assume it would be approximately the same as RIB update. We need to make sure out maximum backoff timers exceed this convergence timer to ensure processing is delay above the convergence interval in the worst case scenario.
LSA Generation Hold Time: 100ms (approximately the convergence time) LSA Generation Maximum Time: 1s (way above the 100ms) OSPF Arrival Time: 50ms (way below the LSA Generation hold time) SPF Hold Time: 100ms SPF Maximum Hold Time: 1s ( Maximum SPF runtime is 32ms, meaning we skip 30 SPF runtimes in the worst condition. This results in SPF consuming no more than 3% of CPU time under worst-case scenario).
Now estimate the worst-case convergence time: LSA_Maximum_Delay (1s) + SPF_Maximum_Delay (1s) + RIB_Update (<1s) < 3 seconds. Even under heavily congested network, CPU usage for SPF calculations will not exceed 3% and network will converge to changes under 3 seconds. Here is a sample OSPF configuration template:
router ospf 10
! Suppress transit link prefixes
! Wait at least 50ms between accepting the same LSA
timers lsa arrival 50
! Throttle LSA generation
timers throttle lsa all 10 100 1000
! Throttle SPF runs
timers throttle spf 10 100 1000
! Pace interface-level flooding
timers pacing flood 5
! Make retransmission timer > than arrival
timers pacing retransmission 60
! Enable incremental SPF
It has been well known that link-state IGPs could be tuned for sub-second convergence under almost any practical scenario, yet maintain network stability by the virtue of adaptive backoff timers. In this post we tried to provide a practical approach to calculating the optimum throttling timer values based on your recorded network performance. It is worth noting that three most important timers to tune network for sub-second convergence are the failure detection delay, initial LSA generation delay and initial SPF delay. All other timers, such as hold and maximum time serve the purpose of stabilizing network, and affect convergence in "worst-case" unstable network scenarios. Cisco's recommended values for the initial/hold/maximum timers are 10/100/5000 ms (see [ROUTED-CAMPUS], but those may look a bit conservative as they result in the worst-case convergence time above 10 seconds. Additionally, it is important to notice that in large topologies, significant amount of time is spent updating the RIB/FIB updates after reconvergence. Therefore, in addition to tuning the throttling timers you may want to implement other measures such as prefix-suppression, better summarization (e.g. totally stub areas) and minimization of external routing information. If your platform supports the feature, you may also implement priority-driven RIB prefix installation process.
We omitted other fast-convergence elements such as resilient network design, e.g. redundancy resulting in equal-cost multipathing and faster OSPF adjacency restoration or NSF feature which is very helpful to avoid re-convergence during planned downtimes. We also skipped discussing some other features related to OSPF stability such as flooding reduction and LSA group pacing, that could yield performance benefits in networks with large LSDs. It is not possible to cover all relevant technologies in a single blog post, but you may refer to the further reading documents for more information. And finally, if you are planning to tune your IGP for fast convergence, make sure you understand all consequences. Modern routing platforms are capable of handling almost any "stormy" network condition without losing overall network stability, but pushing network to its limits could always be dangerous. Make sure you monitor your OSPF statistics for potentially high or unusual conditions after you performed tuning, or set maximum timers to more conservative values (e.g. 3-5 seconds) to provide additional safety.
Hope this might help you. kindly rate if found useful.....Best Regards.....Sanjay
you could use ospf fast hellos , BFD etc for quick ospf convergence and also use max-metric option in ospf to avoid blackholing traffic etc but kindly post your network topo or something so that we know what you want to achieve. The more informatino you provide the more it helps us to provide some input.
Question We run asr9001 with XR 6.1.3, and we have a very long delay to
login w/ SSH 1 or 2 to the device compare to IOS device. After
investigation, the there is 1s delay between the client KEXDH_INIT and
the server (XR) KEXDH_REPLY. After debug ssh serv...
Introduction The purpose of this document is to demonstrate the Open
Shortest Path First (OSPF) behavior when the V-bit (Virtual-link bit) is
present in a non-backbone area. The V-bit is signaled in Type-1 LSA only
if the router is the endpoint of one or ...
Hi, I am seeing quite a few issues with patch install and wanted to
share my experience and workaround to this. Login to admin via CLI, then
access root with the “shell” command Issue “df –h” and you’ll probably
see the following directory full or nearly ...