Multiple OSPF Processes & Router-ID Election from Loopbacks

Unanswered Question
Jan 9th, 2008

Having difficulty in finding a firm answer on this detailed scenario, so here goes...

Scenario: One(1) router with two(2) OSPF processes running on it, and two(2) loopback interfaces & IP addresses have been defined.

The root question: Which OSPF process will get which Router-ID/IP address from which interface?

In conjunction with the question above, there are two possibilities but there is also an important implied question of consistency.

In other words, will OSPF processX ALWAYS get the same Router-ID and OSPF processY ALWAYS gets its same Router-ID, both based on looking to available (and not already in use) loopback interfaces - So essentially, what is Cisco's mechanism & is it consistent across IOS versions? There seems to again be two philosophies - Either "Cisco says" the process with the lower (or higher) process number goes first which would provide consistent results; or it says whichever one comes first in the config goes first which will most likely always remain the same, but at least in theory, could be altered & thus effect the resulting Router-ID assigned.

To Illustrate/Mock Things Up:

interface Loopback0

ip address 1.1.1.1 255.255.255.255

interface Loopback1

ip address 2.2.2.2 255.255.255.255

router ospf 100

network x x area x

router ospf 200

network y y area y

Here ospf 100 should have Router-ID 1.1.1.1 and ospf 200 Router-ID 2.2.2.2, but is that because it came first or because it has the lower process number? Would the results be the same if the IOS version changed or the config looked like this from top-down?:

interface Loopback0

ip address 1.1.1.1

interface Loopback1

ip address 2.2.2.2

router ospf 200

network y y area y

router ospf 100

network x x area x

i.e. What do you think ospf 200's Router-ID is?

Please Note (to nip some things in the bud): The typical selection process for determining the Router-ID is known (you don't need to post it just for the sake of posting it). Also, Yes - I'm aware that by explicitly specifying the Router-ID in each process resolves the consistency matter... but then there would have been no need/point in asking the question(s), and the questions are more about the mechanism (and resulting consistency) used by Cisco in a multi-process environment than actually having a "how do I get the end result" answer.

(I've also done testing in lab, so I can essentially answer the question, but a few small tests don't necessarily make things definitive nor would it prompt discussion.)

Thanks.

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 4.6 (8 ratings)
Loading.
Richard Burts Wed, 01/09/2008 - 10:08

Dan

It is good to see you active in the forum.

I have some experience with this, but not enough to be definitive. But enough to contribute to the discussion. In my experience the first process to initialize will get the higher of the loopback interface addresses. In general I would expect the process listed first in the config to be the one to initialize first. And as far as I remember the IOS has been fairly consistent in listing the processes in numeric order.

So I would expect some consistency that the lower numbered process should generally get the higher loopback address as its RID. But as you comment it is not something that we can really count on - who knows what would cause a process to take longer to initialize.

My recommendation is that consistency is a very good thing and to assure real consistency you should configure the RID in each process.

HTH

Rick

marikakis Wed, 01/09/2008 - 10:24

Hello,

First of all, I am not posting for the sake of posting. I get the picture that you have done your homework. I am posting because I find this interesting and although I have never done this test myself, I would really like to know the answer you already have. :-)

I will try to put myself in the router's shoes or rather in the poor guys' shoes that write the software for it. When the administrator does not specify things by force, I have to make a decision of my own. When you start the first process, I cannot generally know if you will start a second process later or when you do start one whether the second process will have a higher ospf process number than the first one (only in the case where you chose process number 1 for the first process could I know that the second process will have a higher process number). So, I follow the same logic every time you start a process. I only like up/up interfaces, I like loopbacks more than any other interface, I try to grab the loopback with the highest id not already in use. If you have 100 loopbacks I will begin my search from the highest and when I find the first available, I stop my search. If you have no loopbacks or are all already in use, I will try using highest IP address from up/up interfaces. If administrator does not like what I do, I give him a hook to router-id the process by force.

Kind Regards,

M.

Richard Burts Wed, 01/09/2008 - 10:34

Maria

You are right that it is an interesting question and we welcome you as part of the discussion.

One small point: both you and Dan seem to believe that the OSPF process will prefer to take the loopback with the lowest interface number. In my experience OSPF prefers to take the address that is numerically higher. So it matters less whether it is loopback0 or loopback1 and matters more whether it is network 1.1.1.1 or 2.2.2.2. If Dan has test results that indicate that the behavior has changed since I did my testing with this (which has been a while) please jump in. Other than that I think that your summary of the router logic is right on the mark.

[edit] when I re-read your post I see that I misinterpreted your discussion about the search for the loopback. My first read thought that you were saying that it stopped when it found the first available (which I thought meant first numerically available interface). Now that I read again you did specify that it starts from the highest and searches for an available address. My apologies.

HTH

Rick

marikakis Wed, 01/09/2008 - 10:48

Yes, you are right, I got a bit confused on this. I have edited my response when I saw my own note about non-loopbacks, but I am still confused. The process for other types of interfaces is clear to me, I think I got it right in the first place, but the loopback part still confuses me.

I think I have read about "first configured loopback" somewhere else, but I am not absolutely sure. I will keep searching all night (it is 21:00 in Greece now) unless we have the answer :-)

I want to add one thing however. When router initializes, it does not matter if configuration of OSPF processes is "sorted". This because the configuration is passed to the parser and from the parser's point of view this procedure is no different than the procedure the administrator follows to conf t during normal router operation.

Kind Regards,

M.

p.s. Keep editing (wrote 'interfaces' instead of 'processes'), I'd better go to sleep!

marikakis Wed, 01/09/2008 - 11:15

You were right in the first place, you read correctly, I am the one that keeps making mistakes (spelling and non-spelling). You are quick, I should edit faster :-)

I got my library down on the carpet, and the Cisco IP Routing super-book from Alex Zinin says that the command "router-id " was introduced in IOS 12.1. Before that, the router-id was selected automatically by using "highest IP address" logic, first tried loopbacks and then other interfaces.

Kind Regards,

M.

p.s. Still not gone to sleep, but I can handle copying the book. :-)

Richard Burts Wed, 01/09/2008 - 11:26

Maria

I am glad that I did not mis-read the post the first time around. And I believe that you are correct in the revised way that you describe it. The general logic is:

- only consider addresses that are not already used as router ID.

- only consider addresses that are up/up.

- prefer loopback interfaces and then physical interfaces.

- prefer numerically higher addresses to numerically lower addresses.

So IOS searches among available loopback interfaces for the numerically highest available loopback and if it finds no available loopback then it searches among available physical interfaces for the numerically highest available physical interface. And if it can not find any available loopback or physical interface to serve as RID then the process does not initialize.

HTH

Rick

marikakis Wed, 01/09/2008 - 11:27

Rick, I am giving you 5 for helping me clear this point with "highest IP address" logic.

Have fun in NetPro!

Edit: Good summarization man! I will give you another 5, so others (and me:-) can find this information right away when needed.

Richard Burts Wed, 01/09/2008 - 11:50

Maria

Thank you for the compliment and for the rating.

Now it is time for you to get some sleep.

HTH

Rick

dmarekatc Wed, 01/09/2008 - 13:00

Hello Rick (& All),

Nice to see you're active as ever here.

Yeah, it was an interesting pondering as I looked into router-id behavior under multiple processes and thought it might make for some discussion and/or debate.

For any that are curious, we have monitoring that gathers quite a bit of information for dissemination / alerting of problems, and the potential of differing values can give it confusion, so bringing consistency while allowing for things to be "semi-dynamic" (use of loopbacks versus an explicit router-id) led me down this road...

(BTW there were comments/then edits in the posts above, so I'm not sure where things stood but I'm not under the impression interface number matters [it doesn't], but rather it's the highest IP on any available loopback that is not already a router-id... However, since re-edits seem to be fashionable I do see in my first example I also flipped the router-id mistakenly *kudos to you if you caught it* / sorry if it brought confusion)

*Let me repost the examples correctly*

To Illustrate/Mock Things Up:

interface Loopback0

ip address 1.1.1.1 255.255.255.255

interface Loopback1

ip address 2.2.2.2 255.255.255.255

router ospf 100

network x x area x

router ospf 200

network y y area y

Here ospf 100 should have Router-ID 2.2.2.2 and ospf 200 Router-ID 1.1.1.1, but is that because it came first or because it has the lower process number? Would the results be the same if the IOS version changed or the config looked like this from top-down?:

interface Loopback0

ip address 1.1.1.1

interface Loopback1

ip address 2.2.2.2

router ospf 200

network y y area y

router ospf 100

network x x area x

i.e. What do you think ospf 200's Router-ID is?

*The Results*

As for actual behavior, it's somewhat disappointing... It appears the order in the config determines which process goes first and not the process number upon reload - Nor does the CLI rearrange things for you. (That means, in the 2nd example - ospf 200 gets 2.2.2.2) So be mindful of how you order things (as with ACLs), to get the desired results.

Also note, a clearing of the ospf processes does not necessarily reset everything - And it is possible to retain what would be an existing "wrong" router-id (if you went by normal election rules) until the router is actually reloaded. Think of a case where you delete a process with a higher loopback router-id - The remaining process will retain it's original value until a reload, unless it too is deleted and re-added (thereby triggering the election of a router-id).

Regards.

marikakis Wed, 01/09/2008 - 13:31

Hello,

I will repeat something I have already said, because it might have been lost in the postings above.

When you start the first process, router cannot generally know if you will start a second process later or when you do start the second process whether the second process will have a higher ospf process number than the first one (only in the case where you chose process number 1 for the first process could router know that the second process will have a higher process number). So, the router follows the same logic every time you start a process. Cannot wait for you to enter following process configuration, to see what you mean. Commands take effect as they are entered, not in the future.

Now, when router initializes, the configuration is passed to the parser and from the parser's point of view this procedure is no different than the procedure the administrator follows to conf t during normal router operation. So, it follows same procedure, only in this case the ospf process id's are passed to the parser sorted (lowest first). The parser still does not care at initialization that configuration is sorted. The configuration file contains all the commands you had entered, but not in the order in which you entered them, so some information is inevitably lost during initialization.

This does not seem that disappointing to me, especially if I have done nothing to make things work my way. Whenever a process starts, the same procedure is followed. The process number is never used as a factor in the algorithm, because it can't. It requires knowledge from the future for something like this to work. This future knowledge can only be taken advantage of at initialization, but not during normal operation, so even if something like this had been implemented, it would still not be consistent with normal operation.

Kind Regards,

M.

dmarekatc Wed, 01/09/2008 - 13:53

Hehe - Fair enough on your point regarding consistent startup behavior in your last paragraph, since (from the router's perspective) by taking whatever process comes listed first, is being consistent.

So perhaps I should rephrase: There is consistency in the manner in which the router initializes the config, so I can't be disappointed there, it's "disappointing" (least in my case) that the parser doesn't look at all entries at the time of initialization and then orders by process number. (I completely understand of course not taking into account additional processes that get added afterwards - I'm only referring to startup).

In other words, you're saying the router parses the ospf processes top-down (first come, first serve), where as what I was hoping for would more like route selection per se (where the more specific route is selected even if it's not listed first), but of course in this case it would be that the process number acted more like a priority in determining which goes first.

Outside of that - I appreciate you input & comments. Thanks!

marikakis Wed, 01/09/2008 - 14:03

The parser is a simple, stupid program that reads commands one line at a time. Having stupid programs in embedded systems is good. Simple and neat, fewer bugs. Bugs are still out there, of course, but could be more. :-)

Kevin Dorrell Thu, 01/10/2008 - 02:37

This is a fascinating discussion, and I cannot resist the temptation to throw a spanner in the works. I just want to take a sideways look at this question for a moment.

As I understand it, the RID intrinsically has nothing to do with any IP address. It is just a 32-bit number that identifies the router. It does not need to be a routeable IP address. But for its purpose, it is pretty essential to ensure that it is unique within the OSPF domain. So the algorithm the router uses, for want of any hard-coded value, a number equal to one of its IP addresses, in the hope that it will be unique.

Having said that, what happens when you have two OSPF processes? Why have the software writers chosen to ensure that the RID is different between the processes? What would happen if they are the same?

As far as I am aware, you cannot run the two OSPF processes on the same real interface - they would get their Hellos confused. So why do you need the RIDs to be different?

Kevin Dorrell

Luxembourg

Richard Burts Thu, 01/10/2008 - 05:06

Kevin

Welcome to the discussion. I believe that Dan as the original poster would agree that the more participants the better. I have some comments about a couple of your points.

I believe that you are correct that the RID is a 32 bit number that identifies the router. As part of the process of accurately drawing the topology map you want to uniquely identify each data source. As an identifier the RID does not need to be a routable address but should be unique within the OSPF domain.

Left to its own the OSPF process will pick an IP address from one of the interfaces on the router. This should assure that the RID will be unique because we assume that an IP address should appear at only 1 place in the network. (If the same address does appear in more than one place then your network has problems) You do not need to let the OSPF process pick the interface and you have the option to manually configure the RID. As Dan points out there are possibilities that over time the OSPF process might pick different addresses for its RID. So if you have some reason to want to have very stable choice of RID (as Dan does) then manual configuration is recommended.

If you manually configure the RID there is no requirement that the RID be matched in a network statement and no requirement that the RID be an address that is advertised to neighbors. In fact it is not a requirement that the RID actually be an address on that router (if manually configured).

You are certainly correct that you can not run 2 OSPF processes on the same physical interface. If you configure network statements in both processes to include the interface it will be active in only one of them. I assume that the RIDs must be unique in an attempt to assure that we can still accurately identify sources of information if information between the 2 processes were shared (redistributed).

HTH

Rick

Kevin Dorrell Thu, 01/10/2008 - 06:19

Hi Rick,

Thanks for the comments. The business about having the same RID on two routers because they had the same IP address is a classic CCIE lab "gotcha". (Not from a real exam, I hasten to add, but from my training provider!)

You get candidate to configure OSPF, and it all works fine. Then in the very last section of the lab, multicast, you get him to configure an anycast scenario with MSDP. This, of course, involves loopbacks on two routers with the same IP address. Everything works fine until you reload the routers, at which point the OSPF stops working.

Can I take you to task on "I assume that the RIDs must be unique in an attempt to assure that we can still accurately identify sources of information if information between the 2 processes were shared (redistributed)" ? Sorry, but I am not convinced yet. Each OSPF process has its own database, and the local process id should be enough to identify the information. Sorry, but I am still looking for a valid reason why the two processes should use different RIDs.

Kevin Dorrell

Luxembourg

marikakis Thu, 01/10/2008 - 06:34

Kevin,

I was thinking the same thing about the process id working as a differentiating factor for redistribution purposes (local process id's seem fine for a local redistribution procedure). Unless Rick has something particular in mind that we cannot think of. I still haven't got an answer to your painful question though. :-)

Kind Regards,

Maria

marikakis Thu, 01/10/2008 - 13:39

Kevin,

I had a few thoughts, nothing conclusive, I will just list them here.

1. Specification requires some protocol data structures (in Section 5) that include the router-id, but specification does not say anything about second process, so perhaps a similar structure was created by implementors for the case of additional processes (with different RID's) as a simple extension of this concept. That is, instead of pluging things (areas, etc.) around the same router-id, create a new one and plug them there (software neat architecture considerations).

2. Specification says nothing about process-id's. The formal way to do things is according to specification, and not according to process-id hooks. Process-id is just an administrator-friendly parameter.

3. The different router-id for each ospf process approach has the advantage of simplicity. Non-overlapping graphs (nodes and links) can help you sleep peacefully. I was thinking of a possible issue with virtual links that do specify a remote router-id as a parameter, but I can't conclude and support sufficiently that there could be problems with this or that there won't. Sometimes when you implement something, you might choose a conservative strategy that will help you sleep well, instead of trying to consider all possible cases and prove that no issue will exist under any circumstances if you use the same RID in 2 different processes.

Kind Regards,

M.

Edit: The joke of the Specification in Section 5 (i.e. the word "smallest"):

"Router ID

A 32-bit number that uniquely identifies this router in the AS. One possible implementation strategy would be to use the smallest IP interface address belonging to the router."

Kevin Dorrell Fri, 01/11/2008 - 00:46

Maria,

Thanks for those thoughts. I think you are right that it was an implementation decision rather than a architectural constraint that made them use different RIDs for the two processes. I suppose it was less of a risk as well, because that way they could be sure there were no "gotchas" - a minimal risk approach. I would probably have made the same decision.

It would be interesting to hear if Russ White has anything to say about it.

I am at work now, so I cannot check it out, but I seem to remember that if you configure OSPF with a static RID, and you then introduce ipv6 router ospf, then the IPv6 OSPF takes the same RID, and does not let you change it. I shall have to look at that a bit closer ... I usually use the same process ID for my IPv6 OSPF as for my IPv4 OSPF, so that may have something to do with it. I hadn't realised that they were so closely interdependant. So if you have two IPv4 OSPF processes and one IPv6, what RID does the IPv6 OSPF use?

As for "the smallest IP interface" ... what were they thinking of? I guess loopbacks do have the highest packing density, so it figures. You really have read that document from cover to cover! ;-)

Kevin Dorrell

Luxembourg

Kevin Dorrell Wed, 01/23/2008 - 08:13

In answer to my question about whether there was some fundamental constraint that the two OSPF processes should have different RID: yes there is. The reason is to catch the case where you have two processes running OSPF out two interfaces, but those two OSPF domains are joined up eventually on another router. In that case, the two processes should have different RIDs, and effectively our router is treated as two different routers as far as the OSPF cloud is concerned.

Now I see the answer, it should have been obvious in the first place - it was just a matter of thinking "out of the box"!

Thanks to Marek Karasek, who presented the "New Developments in OSPF" session at Networkers, for pointing that out.

Kevin Dorrell

Luxembourg

(actually, Barcelona)

Richard Burts Wed, 01/23/2008 - 09:43

Kevin

Thanks for posting the clarification. I had sort of thought along those lines when you asked the question in this discussion but had not gotten it clearly articulated. This presents it nicely.

HTH

Rick

marikakis Wed, 01/09/2008 - 13:57

Addition:

The way the configuration file is crafted is very efficient (simple text, minimum information that does not include "time" information or cancelled configuration, which means minimum space to store in generally not large storage medium and fewer bugs, works same way across product lines) and is maintained in an order that guarantees that the commands entered at initialization will not fail for some stupid reason. All the loopbacks are entered before routing protocol configuration. Imagine that after you entered the config you posted as example, you had decided to put another loopback in the config. This loopback will be passed to the parser at initialization before the routing processes for good reasons, so you still have a problem, but router has to make sure some things go before others for a smooth startup.

Seems pretty neat to me. Wouldn't like it more any other way.

dmarekatc Thu, 01/10/2008 - 06:04

To modify the discussion a little, and ultimately give some insight into what I'm considering making use of this all for, let me ask this in terms of which is functionally better (and it may be that there's no real discernable difference, more just a personal preference):

Say Loopback0 is used as the management interface and process 100 is the "main" process for routing (so there is more interest in the RID remaining consistent). Loopback1 is tied to dial-backup and the second process (200) is related to that.

Here's the Mockup:

interface Loopback0

ip address 1.1.1.1 255.255.255.255

interface Loopback1

ip address 2.2.2.2 255.255.255.255

interface Async1

ip unnumbered Loopback1

router ospf 100

* Will want 1.1.1.1 as the "mgt" RID

but would have 2.2.2.2 at the moment

router ospf 200

* Will want 2.2.2.2 as the "dial" RID

but would have 1.1.1.1 at the moment

** The Which One Part **

Would it be "better" to eliminate the 2nd loopback all together by moving its IP to the Async interface? - Thus reducing the total interface count by 1, and would allow the ospf process 100 to then select the desired RID since there is only one loopback to choose from. (The second process would either need to be explicitly defined or if there is less care about the RID for it, just let it pick from all remaining interfaces).

The result would look like this:

interface Loopback0

ip address 1.1.1.1 255.255.255.255

interface Async1

ip address 2.2.2.2 255.255.255.0

* Can't have a /32 here so the mask gets changed

router ospf 100

* Will want 1.1.1.1 as the "mgt" RID and would have 1.1.1.1

router ospf 200

router-id 2.2.2.2

* Or if this is less important, let the router decide

Or would it be "better" to leave the Async with an ip unnumbered (thus also keeping Loopback1) and just explicitly defining the RIDs as needed? (I realize this might beg the pros/cons of using ip unnumbered or not, in conjunction with the discussion on maintaning consistent RIDs)

Thoughts?

Richard Burts Thu, 01/10/2008 - 08:13

Dan

Given a choice between these 2 alternatives I would opt for the second choice - keep both loopbacks, let the async get its address via unnumbered, and explicitly define the RID. Functionally I am not sure that there is much difference. But option 1 in which the OSPF process makes a choice leaves some possibility that at some point it might choose differently while configuring the RID ties it down. In situations where stability of the RID is a consideration I would prefer to tie it down and remove the possibility of a different choice.

Other people may have other opinions. But I believe that where consistency in some aspect (such as RID) is desired that we should avail ourselves of every opportunity to explicitly configure those variables.

HTH

Rick

mkarasek Wed, 01/23/2008 - 08:36

Hi,

To start with, there is no real guaranty.

But algorithm simply works as follows, so results should be consistent:

First process walks interface list and uses first loopback he finds.

Second process walks interface list, finds that first loopback is used so it uses next one.

If no loopback is found highest available IP address is used.

I'm not 100% sure about position of interfaces on the interface list, but if we speak about case when cfg is parsed after reload this should be consistent.

Also for which process is algorithm called first depends on order of processes in cfg.

/marek

Richard Burts Wed, 01/23/2008 - 09:54

Marek

My experience is slightly different from what you describe. When any process looks for an address to use as RID it will choose the numerically highest loopback that is available and is not necessarily the first in the list and if no loopback is available then it will choose the numerically highest physical interface that is available.

HTH

Rick

dmarekatc Wed, 01/23/2008 - 10:55

Rick,

What you describe is my experience & understanding as well on the selection process.

The general flow of course as described in the previous post holds, with the afore mentioned adjustments.

(I'm also just posting, since I find it mildly amusing that in a single topic we have different people posting who have the same, and not so common name, in the portions - first / last)

Actions

This Discussion