ospf convergence time

Unanswered Question
Mar 15th, 2009

I have question about improving ospf convergence time within a test mpls core network. The network consist of 10-Gig ethernet and Optical connections.

There are a few configuration options that can be combined together to achieve sub-second ospf convergence.

I am looking at these options:

1. BFD

2. OSPF Fast-hellos

3. timers throttle spf/lsa

4. FRR

5. RSVP Hellos

I am mainly focussing on first three configuration options to achieve quick failure detection and convergence (1 s or less).

I am trying avoid FRR for supportability issues and ease of admin overhead.

I have read through several docs on BFD, Fast-hello and timers throttle spf/lsa configuration. I do believe it is possible to achieve convergence to this level, if not exactly in msec.

Can I achieve fast ospf convergence with a combination of fast-hellos, bfd, and timer throttle ??

Any ideas/suggestions would be appreciated!!!

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Joseph W. Doherty Sun, 03/15/2009 - 09:41

Well both #1 or #2 will detect logical link failure much quicker if you reduce the timers. With #2, although the hellos are subsecond, believe the dead time is still a second. With #1, assuming you use the minimum time and multiple hellos, you should obtain subsecond loss detection, but also recall with the practical minimum time and the normal 3 hellos, your detection time might be half second or more.

Yet, even for physical link detection, which is even faster (200 ms or so?), you may not have subsecond network convergence. One issue, I recall(?), there's some minimum time (half second?) for LSA propagation. Whatever it is, it takes time for the LSA to propagate through the network (especially if there are multiple hops).

So it may not be possible to guarantee full network subsecond OSPF convergence.

fawad.alam Sun, 03/15/2009 - 09:54

Hi Joseph,

Thanks for reply! I do understand with fast-hellos the the quickest failure detection can be 1 sec.

If I use BFD with fast hellos it would improve the detection time, right?? It can be down to 50 msec.

Now starts the convergence process. If I setup the timers throttle spf to the low values say 10msec (wait), 10msec (delay b/w first and second SPF), 10msec (max wait)using this command :

timers throttle spf 10 10 10

Would that help? What about timers throttle lsa ? Any recommendations for the low values, it has to be safe as well.

Thanks

Joseph W. Doherty Sun, 03/15/2009 - 12:38

"If I use BFD with fast hellos it would improve the detection time, right?? "

I believe so.

"It can be down to 50 msec."

I had 150 ms in mind, but I wasn't sure. Without looking, I suspect you're correct. (I likely had the wrong "fifty" value in mind, and why I thought 3x would be about half sec.)

For "timers throttle spf", it appears the default (at least for some IOSs) is off. Setting this value appears to slow how quickly spf calculates in response to LSAs to increase stability.

Of more interest are the commands "timers throttle lsa all" (your last question) and "timers lsa arrival". Cisco describes:

"Benefits of OSPF LSA Throttling

Prior to the OSPF LSA Throttling feature, LSA generation was rate-limited for 5 seconds. That meant that changes in an LSA could not be propagated in milliseconds, so the OSPF network could not achieve millisecond convergence.

The OSPF LSA Throttling feature is enabled by default and allows faster OSPF convergence (in milliseconds). This feature can be customized. One command controls the generation (sending) of LSAs and another command controls the receiving interval. This feature also provides a dynamic mechanism to slow down the frequency of LSA updates in OSPF during times of network instability. "

Again, what I had in mind for a default, without this feature, was .5 seconds, but the above notes w/o this feature default is 5 seconds.

Between faster detection of lost peers (i.e. BFD) and this feature, as the above also notes, milliseconds OSPF convergence should be possible.

On the issue of "safety", pushing the LSA(s) sooner could be countered balanced by throttling SPF calculations and/or throttling LSA acceptance.

Actual values that would work best but safely within any environment, would likely also depend on the actual topology and capabilies/performance of the devices processing OSPF.

The intial trick appeas to be to use IOSs that support OSPF LSA Throttling (even as default) and OSPF with BFD. Then tune if necessary and as needed. Other enhanced OSPF features might be required to ensure stability while trying to have your network converge so quickly. (e.g. OSPF Update Packet-Pacing Configurable Timers, OSPF Link-State Database Overload Protection, OSPF Incremental SPF, etc.)

Proceed carefully. One thing I recently noticed, if/when working with L3 switches, their ASICs make the data plane fast, but their control plane isn't usually nearly as potent.

fawad.alam Sun, 03/15/2009 - 16:14

I agree with Joe's comments above; It does need some careful consideration before you start playing around with ospf timers.

I think, BFD should be best choice for Failure detection as with Fast-hellos it would take atleast 1 sec to detect neighbor loss.

For convergence, I think changing the "Initial SPF schedule delay" by using "timers throttle spf" would bring initial LSA generation from 5 second to few milliseconds after a topology change. Other timers of "timers throttle spf" and can be reduced to few milliseconds.

The "timers throttle lsa" should also be tweaked to bring down lsa generation time.

I don't know the good combination of these two timers throttle command as I have still not been able to find a document that compares all these timers together.

If anyone knows of such document please let me know!!

much appreciated !!

Giuseppe Larosa Mon, 03/16/2009 - 00:06

Hello Fawad,

in 12.2SX configuration guide there is a small chapter about OSPF throttling

timers throttle spf is related to how and when the SPF is executed.

This command introduces a self-adaptive behaviour: reaction to a single topology change can be fast but as multiple topology changes are involved the hold time between two SPF calculations is increased (doubled).

the last parameter max-time says what is the max-time to hold two SPF calculations.

see

http://www.cisco.com/en/US/docs/ios/iproute/configuration/guide/irp_ospf_short_path_ps6017_TSD_Products_Configuration_Guide_Chapter.html#wp1054065

and here is the command reference

http://www.cisco.com/en/US/docs/ios/iproute/command/reference/irp_osp3.html#wp1017802

timers throttle lsa all is defined as a way to perform rate-limiting of LSAs.

the range of all timers in both commands is between 1 and 600,000 msecs.

>> Other timers of "timers throttle spf" and can be reduced to few milliseconds.

Be aware that SPF is still a cpu intensive process so you need to validate with lab tests the values that can be used without causing cpu to go to 100%.

I think that to take advantage of this feature last value the max-time should be higher to leave the self-adaptive behaviour to happen.

In this way you get very fast convergence in case of a single topology change and you still protect the devices from excessive resource usage that would be caused by fixed aggressive timers.

How the two series of timers interact is also to be tested: if LSAs are sent out as soon as possible, this can make the SPF to executed every max-time interval (multiple change events are detected and SPF throttling moves to the highest timer).

Hope to help

Giuseppe

Actions

This Discussion