RIP holddown process

Unanswered Question
Mar 7th, 2009
User Badges:

This holddown processs is not clear, all sources contradict between themselves and with the tests I performed in GNS3/Dynamips:

R1<->R2<->R3<->R4<->R1(so 4 routers in ring)

and set 1.0.0.0/8 on R1-R2, 2.0.0.0/8 on R2-R3, 3.0.0.0/8 on R3-R4, 4.0.0.0/8 on R4-R1.

Then declared a loopback on R2 with 5.5.5.5/8 IP address, started rip debugging on all routers, performed a "shutdown" of the loopback interface and checked what happened till the 5.0.0.0/8 was removed from all routers and it was not advertised anymore by any router.

Then did "no shutdown" of the loopback interface and started again the test, just that now let's say that 10 seconds after the "shutdown" I did "no shutdown" to simulate that the network flips up/down.

I observed that immediately after the R2 detected that its loopback is down, it removed from its routing table the 5.0.0.0/8 network:

show ip route 5.0.0.0

% Network not in table

Then immediately R2 sent triggered updates containing only the subnet 5.0.0.0, with metric 16, to R1 and R3 and here's what was seen on R3:

RIP: received v1 update from 2.2.2.1 on FastEthernet0/0

5.0.0.0 in 16 hops (inaccessible)

RIP-DB: Remove 5.0.0.0/8, (metric 4294967295) via 2.2.2.1, FastEthernet0/0

RIP: Update contains 1 routes

Next, R2 and R3 removed from their routing table the subnet 5.0.0.0 and send as well triggered updates to their neighbors.

So in a matter of seconds the 5.0.0.0/8 network was removed from all routers' routing tables.

But then all the routers were sending periodic updates containing as well 5.0.0.0, with metric 16, during a period of 60 seconds.

After these 60 seconds, the route 5.0.0.0 was not advertised anymore by any router, the following message was in the debug info of all routers:

RIP-DB: garbage collect 5.0.0.0/8


As mentioned, I simulated as well the flipping up/down, by setting: shutdown, no shutdown, shutdown, and the update was quickly spread all over the routers each time, both for good news and for bad news.

The tests showed so that there is no hold-down mechanism at all... and this makes me unable to understand on my own the hold down mechanism.

That test showed as well something I was very surprised to see: that from the moment the subnet 5.0.0.0 went down till it was removed from the network and not advertised anymore by any router, passed only 60 seconds...hmmm where is this timer taken from ? 60 seconds could only happen after a route invalid timer expires till the route is flushed (240s -180s = 60s), but here we don't have anything to do with invalid timer etc...so in this case I'm completely puzzled about this behaviour.


An extra test when the new metric was higher but different than 16 showed the same behaviour. I simulated this metric increase by adding inside R2 an offset to the original metric when advertising it towards R3:

router rip

offset-list 1 out 3 f0/1

network 1.0.0.0

network 2.0.0.0

network 5.0.0.0

!

access-list 1 permit 5.0.0.0 0.255.255.255

Just to mention that the timers were the default ones:

timers basic 30 180 180 240


So to sum up:

- the theoretical part is contradictory, how hold-down process should work is not clear

- practical emulation test showed that no HOLD DOWN timer was started. Why this behavior? Additionally the route was flushed from the entire network 60 seconds after I typed "shutdown" command.

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (7 ratings)
Loading.
jorge.calvo Tue, 03/10/2009 - 17:36
User Badges:
  • Bronze, 100 points or more

Hi,


That 60 seconds period is called 'flush timer' or 'garbage collector timer'


This is the default implemented RIP behavior.

badalam_nt Thu, 03/12/2009 - 16:09
User Badges:

I think there is an inconsistency in the way the flush timer was defined by CISCO, as it is mentioned that by default is it 240s, not 60s.

- when running after invalid timer expires, it could be seen that we have 240s (180s invalid timer + 60s afterwards)

- when running after a route is received with metric 16 it could be seen that we have only 60s, not anymore 240s.

So THIS IS MISLEADING/INCONSISTENT AND CREATES CONFUSIONS.

It seems even CISCO guys were confused, as their PacketTracer5.0 implemented wrongly the flush timer: it is running 240s after those 180s of invalid timer, so in total 180s+240s=420s., whereas in real equipments it runs 180s+60s=240s in total.


Based on the tests I've done, if I were in Cisco shoes I would redefine the flush timer as follows:

1) the flush timer is equal to 60s (NOT to 240s)

2) the flush timer is started:

- when an update is received for a route, having metric=16

OR

- when invalid timer expires

OR

- when holddown timer expires

To sum up these 3 cases:

-the flush timer is started when the router starts to advertise towards its neighbours the route with metric=16

3) the flush timer is stopped:

- when an update is received for that route with a metric lower than 16


In addition to that I would mention (as it nowhere documented, not even on Cisco site/docs) what I observed after performing several tests: that HOLDDOWN timer process is NOT implemented anymore. If it was implemented in the past IOSs, it must have been removed from the latest IOSs, when they realized the fact that "triggered updates" feature overpasses it completely and so makes useless the HOLDDOWN timer.


I would be really glad to see this statement from CISCO, confirming my hypothesis. It would certify all observations I've drawn after those many tests and studies I performed around holddown process.

Edison Ortiz Thu, 03/12/2009 - 17:28
User Badges:
  • Super Bronze, 10000 points or more
  • Hall of Fame,

    Founding Member

Then declared a loopback on R2 with 5.5.5.5/8 IP address, started rip debugging on all routers, performed a "shutdown"


Instead of shutting down the loopback, shut down the interface on R2 facing R3.


R3 will keep 5.5.5.5/32 on the routing table for 180 seconds, then after those 180 seconds, it will mark the route down and after 240 seconds the route is flushed.


Shutting down the loopback while keeping the link between R2 and R3 will create a RIP triggered update which overrides any holddown timers.


Here is what I did:


R3<->R1<->R2


R3 is advertising 3.3.3.3 and I decided to shut down the interface between R3 and R1, let's see R1 routing table:


R1#sh ip route rip

3.0.0.0/32 is subnetted, 1 subnets

R 3.3.3.3 [120/1] via 192.168.13.3, 00:00:45, FastEthernet0/1


...


You will see how the timer is increasing:


R1#sh ip route rip

3.0.0.0/32 is subnetted, 1 subnets

R 3.3.3.3 [120/1] via 192.168.13.3, 00:02:56, FastEthernet0/1


Once I get to the magic 3 minutes.


I received this RIP update:


00:15:04: RIP: received v2 update from 192.168.12.2 on FastEthernet0/0

00:15:04: 3.3.3.3/32 via 0.0.0.0 in 16 hops (inaccessible)


and the routing table shows:


R1#sh ip route rip

3.0.0.0/32 is subnetted, 1 subnets

R 3.3.3.3/32 is possibly down,

routing via 192.168.13.3, FastEthernet0/1


It then remains there for an additional 60 seconds until it was flushed.


HTH,


__


Edison.


badalam_nt Fri, 03/13/2009 - 11:03
User Badges:

Edison, this is the invalid timer of 180s followed by another 60s, after which the route is flushed.

No question about it, so I'm not asking anything about it.


I mentioned 2 things nobody wants to confirm nor deny:

- there is no HOLD DOWN process in the Cisco IOS anymore, most probably they removed it after realizing the "triggered update" feature superseded the holddown feature.

- the definition of flush timer as being 240s is wrong => it should be better replaced by what I mentioned, that it is 60s and that it is triggered when the metric of a route becomes 16

This way no confusion will be.


With the current definition, confusion arose from 2 different axis:

- after invalid timer expires questions are raised on all forums regarding how long it takes to flush the route: is it another 240s or another 60s?

If something was clear then there wouldn't be such questions. Even Cisco guys are confused, as in PacketTracer5.0 it takes additional 240s after invalid timer expires.

- supposing you are warned that PacketTracer is wrong and so you now know that the route is flushed after 240s in total, not 240s after invalid timer expires, then even in that case you will fail when asked how long it takes to flush the route after a router received a triggered update for that route, with metric=16.

It is 60s, which really contradicts with the definition of flush timer =240s, because here you don't see at all 240s.

Edison Ortiz Fri, 03/13/2009 - 12:01
User Badges:
  • Super Bronze, 10000 points or more
  • Hall of Fame,

    Founding Member

there is no HOLD DOWN process in the Cisco IOS anymore,


That's a bold statement (pun intented :))


Let's see the process one more time:


R0<->R1


R1 is advertising 10.1.1.1 via RIP to R0.


R0#sh ip route 10.1.1.1

Routing entry for 10.1.1.1/32

Known via "rip", distance 120, metric 1

Redistributing via rip

Last update from 192.168.12.2 on FastEthernet1/0, 00:00:04 ago

Routing Descriptor Blocks:

* 192.168.12.2, from 192.168.12.2, 00:00:04 ago, via FastEthernet1/0

Route metric is 1, traffic share count is 1


I've modified all timers to avoid confusion what timer does what:


Routing Protocol is "rip"

Sending updates every 10 seconds, next due in 5 seconds

Invalid after 15 seconds, hold down 40, flushed after 60


I shut down the interface between R1 and R0.


Now please examine the timers and process:


R0#sh ip route 10.1.1.1

Routing entry for 10.1.1.1/32

Known via "rip", distance 120, metric 1

Redistributing via rip

Last update from 192.168.12.2 on FastEthernet1/0, 00:00:00 ago

Routing Descriptor Blocks:

* 192.168.12.2, from 192.168.12.2, 00:00:00 ago, via FastEthernet1/0

Route metric is 1, traffic share count is 1


R0#sh ip route 10.1.1.1

Routing entry for 10.1.1.1/32

Known via "rip", distance 120, metric 1

Redistributing via rip

Last update from 192.168.12.2 on FastEthernet1/0, 00:00:01 ago

Routing Descriptor Blocks:

* 192.168.12.2, from 192.168.12.2, 00:00:01 ago, via FastEthernet1/0

Route metric is 1, traffic share count is 1




R0#sh ip route 10.1.1.1

Routing entry for 10.1.1.1/32

Known via "rip", distance 120, metric 4294967295 (inaccessible)

Redistributing via rip

Last update from 192.168.12.2 on FastEthernet1/0, 00:00:24 ago

Hold down timer expires in 39 secs




R0#sh ip route 10.1.1.1

Routing entry for 10.1.1.1/32

Known via "rip", distance 120, metric 4294967295 (inaccessible)

Redistributing via rip

Last update from 192.168.12.2 on FastEthernet1/0, 00:00:58 ago

Hold down timer expires in 5 secs


R0#sh ip route 10.1.1.1

Routing entry for 10.1.1.1/32

Known via "rip", distance 120, metric 4294967295 (inaccessible)

Redistributing via rip

Last update from 192.168.12.2 on FastEthernet1/0, 00:00:59 ago

Hold down timer expires in 4 secs


R0#sh ip route 10.1.1.1

Routing entry for 10.1.1.1/32

Known via "rip", distance 120, metric 4294967295 (inaccessible)

Redistributing via rip

Last update from 192.168.12.2 on FastEthernet1/0, 00:01:00 ago

Hold down timer expires in 3 secs


R0#sh ip route 10.1.1.1

Routing entry for 10.1.1.1/32

Known via "rip", distance 120, metric 4294967295 (inaccessible)

Redistributing via rip

Last update from 192.168.12.2 on FastEthernet1/0, 00:01:01 ago

Hold down timer expires in 2 secs


R0#sh ip route 10.1.1.1

Routing entry for 10.1.1.1/32

Known via "rip", distance 120, metric 4294967295 (inaccessible)

Redistributing via rip

Last update from 192.168.12.2 on FastEthernet1/0, 00:01:02 ago

Hold down timer expires in 1 secs


R0#sh ip route 10.1.1.1

Routing entry for 10.1.1.1/32

Known via "rip", distance 120, metric 4294967295 (inaccessible)

Redistributing via rip

Last update from 192.168.12.2 on FastEthernet1/0, 00:01:02 ago

Hold down timer expires in 1 secs


R0#sh ip route 10.1.1.1

Routing entry for 10.1.1.1/32

Known via "rip", distance 120, metric 4294967295 (inaccessible)

Redistributing via rip

Last update from 192.168.12.2 on FastEthernet1/0, 00:01:03 ago

Hold down timer expires in 0 secs


R0#sh ip route 10.1.1.1

% Network not in table



badalam_nt Fri, 03/13/2009 - 15:16
User Badges:

Edison, this is NOT hold down process!!!


What you describe is the situation when no update is received anymore, so INVALID timer expires (15seconds) and route is marked as inaccesible, FOLLOWED by the (FLUSH-INVALID)=60s-15s=45seconds till the route is flushed, so in total 60seconds.


Please check any reference you want and you'll clearly see that this is all about invalid&flush, not about holddown.


But anyway thanks for your example, as you, unintentionally, revealed one very stupid bug with this:

"Hold down timer expires in x sec"


If you still have doubts, just increase the flush timer, to 75s for instance:

timers basic 10 15 40 75

then you'll see taht after 15secs (invalid timer) the route is marked as inaccessible, then after another 75s-15=60s the route is flushed.

Funny enough, you'll see that a same message is repeated 20 times (flush-invalid-holddown=75-40-15=20):

"Hold down timer expires in 0 sec"


Not only wrong, but also really stupid message...

Edison Ortiz Fri, 03/13/2009 - 16:03
User Badges:
  • Super Bronze, 10000 points or more
  • Hall of Fame,

    Founding Member

Yup, confirmed.


__


Edison.

badalam_nt Sat, 03/14/2009 - 02:43
User Badges:

So let's restart then the discussion:

Could anyone confirm or deny that:

- there is no HOLD DOWN process in the Cisco IOS anymore, most probably they removed it after realizing the "triggered update" feature superseded the holddown feature.

- the definition of flush timer, as being 240s, is wrong => it should be better mentioned that it is 60s and that it is triggered in the cases I mentioned, i.e when an update is received for a route, with metric=16 OR when invalid timer expires OR when holddown timer expires (if it still exists this latter)


PS: Who rated my previous post?

Giuseppe Larosa Sat, 03/14/2009 - 04:26
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Petru,

you have done a very good job an in-depth analysis of RIP behaviour.

And if I didn't understand wrong Edison confirms your findings with testing on real routers.


About the possible questions in exams I'm afraid they will consider correct the answers that are in line with what books say.


However, this kind of approach can make you a very good network engineer and provide added value to these forums.


From a practical point of view in real world RIP is not very used and this can have left open this knowledge hole.


About rating: everyone can rate a post even if it is not the original poster.

Rating useful posts is a way to signal that useful information is present in a thread.


Best Regards

Giuseppe


badalam_nt Sat, 03/14/2009 - 05:43
User Badges:

Giuseppe, do you confirm my conclusions?


Yes, indeed, RIP has no much practical use, Jeff Doyle even translates RIP into "Rest In Peace", which says it all...

It's purely an academic work what I'm doing, to understand all about the topic I'm studying.


PS: So there's no way to know who gave me points?

Giuseppe Larosa Sat, 03/14/2009 - 06:02
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Petru,


>> do you confirm my conclusions?


I should setup a lab with 3 or more routers to repeat the tests that you and Edison have done.

I can say you have convinced me that the behaviour is different from the one expected from theory.


About knowing who rated your posts:

this has been considered in the idea center but it could open to revenges for poor ratings.


see


http://forum.cisco.com/eforum/servlet/NetProf?page=netprof&forum=Idea%20Center&topic=NetPro%20Ideas&topicID=.2cbe6274&fromOutline=&CommCmd=MB%3Fcmd%3Ddisplay_location%26location%3D.2cbfa9de


Hope to help

Giuseppe


viyuan700 Sat, 03/14/2009 - 13:37
User Badges:
  • Silver, 250 points or more

"there is no HOLD DOWN process in the Cisco IOS anymore, most probably they removed it after realizing the "triggered update" feature superseded the holddown feature"


Hi badalam_nt,


I think HOLDDOWN Process is superseded by Flush timer which is confirmed by lab results here. This is from CCIE 3r Edition R&S about flush and holddown timer


When the route is flushed (removed), any

associated timers are also removed, including the Holddown timer.


Description of timers in Cisco press books is bit confusing but i think little bit better on cisco documentation, where it clearly tells that holddown time starts after invalid timer.


http://www.cisco.com/en/US/docs/ios_xr_sw/iosxr_r3.3/routing/command/reference/rr33rip.html#wp1000112



Only problem is when holddown timer is samller than the flush, its providing a wrong message.


According to CCIE book if holddown timeris small than flush,


Had the Holddown timer been

smaller, and had it expired before the Flush timer, R1 would have been able to use the route advertised by R2 at that point in time.


You can check if holddown timer is smaller than flush and you have another route is it adding?


If adding then RIP code is working fine only a little bug regarding message. If it dont add then i think even the RIP protocol have bug.


In cisco press books there are so many methods to prvent loop such as

split horizon

Route posioning

Holddown timer

Trigeered update with holddown timer


So in your inital lab where you were route is flushed after 60 sec is using Triggered Update with Holddown timer.




badalam_nt Sat, 03/14/2009 - 18:51
User Badges:

Thanks for highlighting this 2nd case when holddown timer is triggered:

- after invalid timer expires


In Jeff Doyle's Routing TCP/IP 2nd edition, it is mentioned that the holddown timer is triggered when an update is RECEIVED for an existing route, with an increased metric.

To quote Jeff:

"If the distance to a destination increases (for example, the hop count increases from two to four), the router sets a holddown timer for that route. Until the timer expires, the router will not accept any new updates for the route."


So the 2 books (CCIE 3rd Ed R&S, Routing TCP/IP 2nd Ed) are incomplete in the way they present the trigger for holddowntimer, each presenting only one face of it.


But here comes the even more interesting part:

- the way presented by Jeff is not implemented by Cisco, as my tests didn't show the behaviour mentioned in Routing TCP/IP book

- the way presented in CCIE R&S 3rd Ed is implemented by Cisco (+ the message bug which, as you mentioned, shouldn't affect protocol's functionality), as my tests show

- the meaningful way of using holddown for preventing routing loops is in my opinion the one described by Jeff, not by CCIE R&S book.


And just to correct your last statement:

"So in your inital lab where you were route is flushed after 60 sec is using Triggered Update with Holddown timer."


As timers were default: 30 180 180 240

and the route was removed after 60s, we can't speak here about holddown because:

- it should have lasted not 60s, but 180s (holddown)+ another 60s (=240-180) till the route was flushed, so in total 240s, which was not the case

- the updates received for that route were accepted during this period of 60s till the route was flushed.

My conclusion (after playing with holddown and invalid timers' values) is that these 60s represent the difference between FLUSH timer and INVALID TIMER, though it's really weird to define in such a way the flush timer, creating confusions (as already stated above)

viyuan700 Sun, 03/15/2009 - 02:35
User Badges:
  • Silver, 250 points or more

"the meaningful way of using holddown for preventing routing loops"


though it's really weird to define in such a way the flush timer


Though after referring few books we get the idea about when holddown and flush timer starts.


But I don't get the idea of using these 2 timers if i am not missing anything about them.


1. When Flush timer is smaller than Holddown, Flush timer removes the holddown


2. When flush is larger than the Holddown, alternate route can be added after holddown expires.


Only one would be enough i suppose like IETF RFC 2453 for RIP have only 3 timers update, timeout and garbage.


And just to correct your last statement:


Was thinking to write maybe it is the case of using Trigerred with Holddown.


As in the book, an example given for Triggered Update with Poisoned Route and RIP converge in less than a minute without any holddown.


Was just guessing it could Triggered update with Hold down where Trigged update starts holddown.


I agree with you that its confusing till the time we dont get any documentation which explains clearly which timers starts when in different situation.



Actions

This Discussion