EIGRP SIA question

Answered Question
Feb 13th, 2010

Hey all,

I have a question regarding SIA, as explained on page 122-124 in "Authorized Self-Study Guide for BSCI" 3rd edition. Luckily this can be found direct from ciscopress.com as a sample chapter (http://ptgmedia.pearsoncmg.com/images/9781587052231/samplechapter/2237_SampleChapter_03.pdf)

SIA.jpg

Basically, they give a scenario where RTR B has a network that goes down (10.1.8.0/24). So he queries A,C,D,E.

CDE answer that they have a good route via A because A was their FS.

A, upon receiving the query from B, query CDE.

CDE, upon receiving the query from A, purge A as the new successor for that route, and query B.

B can't reply to CDE, because it is still waiting for a reply from A.

So, B is waiting for A before it can answer CDE.

A is waiting for CDE before it can answer B.

CDE is waiting for B before they can answer A.

The book doesn't involve the SIA-Query/SIA-Reply packets in this scenario, but ultimately this will delay the resetting of the peer between A and B by a maximum of 6 minutes (assuming the default 180 second acvite timer).

I find it hard to believe that resetting the peer between A and B (which impacts all networks learned between them both) is the optimal behavior for EIGRP to resolve this deadlock situation.

Why can't RTR B, upon reaching 3 minutes (or 6 if you include the SIA-Query/SIA-Reply packet) simply purge the 10.1.8.0/24 route instead of resetting the peer?

This seems much more efficient.

This topology is not too out-of-the ordinary, where you have one subnet hanging off of only one of the distribution routers due to cost limitations/etc, where it isn't feasible that the subnet have redundant connections. I just don't think that if this prefix goes down, it should end up in a neighbor reset to clear it all away.

All other examples imply a bad connection or a faulty link that leads to a reply not being returned. This scenario, however, is a deadlock situation where there is no faulty links (other than the original network going down).

I understand summarizaiton and stub can fix this all, but I just see those as band-aid solutions for what seems to be a flaw in EIGRP. Ultimately there should be RTR-IDs included in the SIA-Reply packet (who you are waiting to get a reply from), and with this extra bit of info, RTR B would have realized A is waiting for B, so that means the network is "unreachable" from A.

Did I miss something?

Thanks for any and all advice!

I have this problem too.
0 votes
Correct Answer by Peter Paluch about 6 years 11 months ago

Hello,

You are welcome. Let's go over your individual comments.

I'm not sure why I always see 3 minutes as the limit to go to SIA. Yes, 3 minutes is the active timer, but this is reset at 90 second intervals with the SIA-Query and SIA-Reply packets (up to three times). So, assuming there are no bad links in the network, or faulty devices, 6 minutes will elapse prior to going to SIA.

Yes, you are correct - the same fact is explained in the Jeff Doyle's book Routing TCP/IP, Volume 1.

I follow Peter's logic, and it makes sense (other than not following split horizon rules ).

Yep I completely skipped the split horizon rules - they only optimize the query process somewhat but do not influence it in any significant way. And also, I forgot completely that the EIGRP uses split horizon with queries I owe you one!

The only question I had was in step 3, and the point you make of CDE answering first being a coincidence. Let's say that A reponds prior to C, D, E. What would A's reply be? B was his successor, and he had no feasible successor -- so would A have responded with infinity, or would have had to have asked C, D, E first?

A very good question. The router A cannot send a reply immediately because the B is its current successor and by its query, it indicates it has problems reaching that network. More precisely, the A will receive the query from B, indicating the current distance of B towards the network 10.1.8.0/24, which is infinity. For router A, the router B was the successor and there are no feasible successors because the C, D and E also decided to use A as their next hop towards the 10.1.8.0/24. Therefore, upon receiving the query from B, router A will lose its successor and will need to forward the query itself before it can provide a reply to B. The remaining process will follow approximately from the step 5.

Also, in step 8. B will reply with infinity to CDE rather than the new metric it learned from CDE in step 3? Is this because those new metrics were never used due to the network never transitioning to passive, or because the queries from CDE was changed to infinity after A queried CDE?

In step 8, the router B must reply with infinity to C, D and E. The reason is that in DUAL algorithm, once a router transitions to the active state, it must not change the current successor, feasible distance or reported distance until it receives all replies and selects a new successor. This is one of the most important principles with DUAL - a router can change its idea about distances and next hops only after it receives all replies because only then it can be sure that the remaining part of the network has already updated its routing tables and distances correctly (by the virtue of repeating the same principle). So, in this case, while the router B has received the replies from C, D and E, it did not receive all replies it needs to make a qualified decision and cannot be sure that the neighbors are already aware of the most recent correct route to the destination.

I suggest strongly reading the following document:

http://www.cisco.com/en/US/tech/tk365/technologies_white_paper09186a0080094cb7.shtml

Regarding the behavior of receiving and sending updates, see this section:

http://www.cisco.com/en/US/tech/tk365/technologies_white_paper09186a0080094cb7.shtml#queryrange

You are welcome to ask further!

Best regards,

Peter

Correct Answer by Peter Paluch about 6 years 11 months ago

Hello tdistlists,

Is the same explanation to the exhibit, as you have quoted it here, also present in the Cisco Press book? If yes then the book is patently wrong on this subject.

The explanation in your post is not correct because it is based on a false assumption: It wrongly assumes that if a router is in active state for a destination, it cannot send a reply to another query that it receives during the active state.

In reality, if a router is in active state for a destination and subsequently receives yet another query, it immediately responds with its current metric that was set in the moment of transition to the active state (note that this metric must be higher than what it was when the router was passive because a transition into the active state is always a result of distance increase). This way, no deadlocks can occur, in contrary to what was described in your original post.

So the sequence of events would be as follows:

  1. B loses connection to the 10.1.8.0/24 network which will result in the metric of this route to go to infinity. No other neighbor is identified as a feasible successor so the router B has to enter the active state for the network.
  2. B starts a diffusing computation by sending a query to its directly connected neighbors. The query contains the current distance of B to the network 10.1.8.0/24, which is infinity.
  3. Assuming that routers C, D and E send their answers sooner than router A can process the query (which is a matter of coincidence, not a rule at all), B will receive replies from C, D and E indicating their current (unchanged) distance to the network 10.1.8.0/24 via router A. Routers C, D and E will not proceed to active mode because their metric was not influenced by this query.
  4. Router A will receive the query from B which was its successor. The query contains the new metric, which is infinity. Router A therefore loses its successor and because no other neighbor is identified as a feasible successor, router A will have to enter the active state
  5. Router A will send queries to all its neighbors indicating its current distance to the network 10.1.8.0/24, which is also infinity. This query will reach routers B, C, D and E.
  6. Router B is currently in the active state for the same network and its current distance to it is infinity. Upon receiving a query from A, it will immediately send a reply to A with its current metric of infinity.
  7. Upon receiving the query from A, routers C, D and E will lose their successor. In turn, they will also enter the active state and send queries to their neighbors.
  8. Routers A and B receive queries from C, D and E. They will immediately send a reply with their current metric of infinity to C, D and E. Moreover, router B will note that the current distance of C, D and E as indicated in their query is infinity and no longer the shorter distance it received in replies in the step 3.
  9. In steps 7 and 8, the routers C, D and E went into active state and received replies to all their queries. After receiving all replies, routers C, D and E conclude that there is no alternate route to the network 10.1.8.0/24. They will therefore change to passive state, send their replies to the router A indicating the metric of infinity, and remove the network 10.1.8.0/24 from their topology and routing tables.
  10. Upon receiving answers from routers C, D and E, the router A will also conclude that the network is no longer reachable. It will therefore send a reply back to router B with the metric of infinity and remove the network from its topology and routing table.
  11. The router B now has replies to all queries it sent out. However, all routers have indicated that the network is no longer reachable. As a result, the router B will also remove this network from its topology and routing table and the diffusing computation is thereby completely concluded.

In particular, the SIA state will not appear in this network.

Best regards,

Peter

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (2 ratings)
Loading.
Ganesh Hariharan Sat, 02/13/2010 - 22:43

Hi,

EIGRP Stuck in Active state means If the successor path is lost & there is no feasible successor path is available, router sends out query messages
on all EIGRP enable interfaces & tries to find out analternative path to the network. It is active state for that route.

Now Router is waiting for reply from its neighbors. If reply is missing for 3 min, dat means router didn't get any reply from neighbors, then it becomes stuck in active.In this case, router reset the neighbor relationship with the router who didn't replied back the query messages sent by the router.This is the normal behavior of EIGRP protocol when ever it is in stuck in active stage.

To solve this problem, two method is used -

a) Router summarization
b) EIGRP Stub.

To disbale the stuck in active timer, the following command is used -

Router(config-router)# timers active-time disable

Hope to help !!

Ganesh.H

Giuseppe Larosa Sun, 02/14/2010 - 02:00

Hello Tdistlists,

EIGRP is the only protocol where a node involves its own neighbors in route calculations, using the Diffusive Update Algorithm (DUAL).

This has its own advantages and several drawbacks like the SIA problem.


EIGRP is also an hybrid protocol that, starting from distance vector roots, introduces some link state characteristics.

For example the concept of EIGRP router-id is somewhat limited to the originator router field in IP external routes TLV data structures.

So an EIGRP hello does not contain the EIGRP router-id of the sender and so on.

You can easily verify this doing a sh ip eigrp neighbors between two devices connected with multiple links: they list as neighbors interface ip addresses not the EIGRP router-id.

A sh ip ospf neighbors provides the OSPF router-ids of neighbors.

EIGRP takes from link state protocol the concept of a neighbor state machine, it  doesn't exchange updates with a neighbor until they reach a state in the neighbor state machine.

EIGRP doesn't take from LS the link state concept, RB when loses its connection to net 10.1.8.0/24 does not react as an OSPF or ISIS router, it tries to find out if some other node can reach the missing IP prefix so it starts these queries.

An OSPF router would just publish an updated router LSA with link state down for the interface that has ip address in 10.1.8.0/24.

So I'm not sure that SIA Query and SIA Reply contains the router-id field and if they provide enough feedback info so that the deadlock can be solved.


These two packets SIA Query and SIA reply  help in distinguishing between alive neighbors and faulty link they provide a form of bidirectional detection.


In my personal opinion the most difficult behaviour to understand is the propagation of a Query  from one EIGRP AS number to another AS.

This is something that could have been avoided.


The use of route filters, route summarization and stub EIGRP concept can help in limiting the query range solving stability problems of a scenario like the one in the figure and are part of a correct design.


In other words, it is left to network engineers to use the appropriate tools to build a working and stable solution when using  EIGRP.

These tools are not workarounds but part of the protocol scalability can be reached only with control of Query range.

For these reasons EIGRP is seen as easy to configure, but with less scalability then link state protocols


Hope to help

Giuseppe

Peter Paluch Sun, 02/14/2010 - 03:26

Hello Giuseppe,

Allow me a few comments.

EIGRP is also an hybrid protocol that, starting from distance vector roots, introduces some link state characteristics.

It is often stated that the EIGRP is a hybrid but personally I disagree strongly with it and whenever I hear it I can feel my blood pressure rising It is true that EIGRP came with enhancements we were used to see in link-state protocols: using hello packets to maintain adjacencies, sending only incremental and event-triggered updates, maintaining a state machine in communication with neighbors - you've named them. This all is true but it does not make a routing protocol a hybrid yet. A true hybrid protocol would discover a part of network in its topological detail and the other part of network would be discovered by a distance vector principle, just like multi-area OSPF. In fact, the multi-area OSPF is the only true hybrid protocol used in general networks (there are some similar hybrids used in ad-hoc wireless networks as far as I know). The EIGRP, as fancy as it may seem, is only a distance vector, even though on steroids

In my personal opinion the most difficult behaviour to understand is the propagation of a Query  from one EIGRP AS number to another AS.

Hmm... I have seen the description of this in the following document:

http://www.cisco.com/en/US/tech/tk365/technologies_white_paper09186a0080094cb7.shtml#boundariesaffect

It does not strike me as being particularily difficult but I may be overlooking something.

For these reasons EIGRP is seen as easy to configure, but with less scalability then link state protocols

I think that estimating the real scalability of a routing protocol, especially when comparing it to other protocols of a different nature, is a difficult thing to do. The concept of areas in IS-IS and OSPF helps very strongly to maintain their scalability to large networks. On the other hand, in EIGRP, you can summarize (and thereby limit the query scope) on any router in the topology while in link state protocols, the summarization can be done only on ABRs and ASBRs. To me, both link-states and EIGRP appear as scalable enough but every one of them approaches it in its own way.

Best regards,

Peter

Correct Answer
Peter Paluch Sun, 02/14/2010 - 03:06

Hello tdistlists,

Is the same explanation to the exhibit, as you have quoted it here, also present in the Cisco Press book? If yes then the book is patently wrong on this subject.

The explanation in your post is not correct because it is based on a false assumption: It wrongly assumes that if a router is in active state for a destination, it cannot send a reply to another query that it receives during the active state.

In reality, if a router is in active state for a destination and subsequently receives yet another query, it immediately responds with its current metric that was set in the moment of transition to the active state (note that this metric must be higher than what it was when the router was passive because a transition into the active state is always a result of distance increase). This way, no deadlocks can occur, in contrary to what was described in your original post.

So the sequence of events would be as follows:

  1. B loses connection to the 10.1.8.0/24 network which will result in the metric of this route to go to infinity. No other neighbor is identified as a feasible successor so the router B has to enter the active state for the network.
  2. B starts a diffusing computation by sending a query to its directly connected neighbors. The query contains the current distance of B to the network 10.1.8.0/24, which is infinity.
  3. Assuming that routers C, D and E send their answers sooner than router A can process the query (which is a matter of coincidence, not a rule at all), B will receive replies from C, D and E indicating their current (unchanged) distance to the network 10.1.8.0/24 via router A. Routers C, D and E will not proceed to active mode because their metric was not influenced by this query.
  4. Router A will receive the query from B which was its successor. The query contains the new metric, which is infinity. Router A therefore loses its successor and because no other neighbor is identified as a feasible successor, router A will have to enter the active state
  5. Router A will send queries to all its neighbors indicating its current distance to the network 10.1.8.0/24, which is also infinity. This query will reach routers B, C, D and E.
  6. Router B is currently in the active state for the same network and its current distance to it is infinity. Upon receiving a query from A, it will immediately send a reply to A with its current metric of infinity.
  7. Upon receiving the query from A, routers C, D and E will lose their successor. In turn, they will also enter the active state and send queries to their neighbors.
  8. Routers A and B receive queries from C, D and E. They will immediately send a reply with their current metric of infinity to C, D and E. Moreover, router B will note that the current distance of C, D and E as indicated in their query is infinity and no longer the shorter distance it received in replies in the step 3.
  9. In steps 7 and 8, the routers C, D and E went into active state and received replies to all their queries. After receiving all replies, routers C, D and E conclude that there is no alternate route to the network 10.1.8.0/24. They will therefore change to passive state, send their replies to the router A indicating the metric of infinity, and remove the network 10.1.8.0/24 from their topology and routing tables.
  10. Upon receiving answers from routers C, D and E, the router A will also conclude that the network is no longer reachable. It will therefore send a reply back to router B with the metric of infinity and remove the network from its topology and routing table.
  11. The router B now has replies to all queries it sent out. However, all routers have indicated that the network is no longer reachable. As a result, the router B will also remove this network from its topology and routing table and the diffusing computation is thereby completely concluded.

In particular, the SIA state will not appear in this network.

Best regards,

Peter

tdistlists Mon, 02/15/2010 - 09:00

Thank you everyone for your reponses. A few things.

I'm not sure why I always see 3 minutes as the limit to go to SIA. Yes, 3 minutes is the active timer, but this is reset at 90 second intervals with the SIA-Query and SIA-Reply packets (up to three times). So, assuming there are no bad links in the network, or faulty devices, 6 minutes will elapse prior to going to SIA.

I know about DUAL, etc, and that rtr-IDs are mainly link-state things, but I was just trying to make suggestions on how to resolve the issue, where I consider summary, stub and filtering band-aid solutions.

However, it looks like my understanding was incorrect, according to Peter.

The book does state this:

BCSI Self-Study, pg 123-124

Router B learns from these queries that none of the remote routers has a path to network 10.1.8.0/24,
but it cannot respond that it does not know of a path, because Router B is waiting for Router A to reply to a query. Router A is waiting for either Router C, D, or E to reply to its query, and these remote sites are waiting for Router B to reply to their queries. Because Router B sent out the first query, its SIA timer expires first, and Router B reaches the SIA state for network 10.1.8.0/24 first
(in 3 minutes by default).

Unless I misunderstood the book, it basically says there is a deadlock situation because no one can answer anyone else due to not receiving an answer from another RTR.

I follow Peter's logic, and it makes sense (other than not following split horizon rules ). The only question I had was in step 3, and the point you make of CDE answering first being a coincidence. Let's say that A reponds prior to C, D, E. What would A's reply be? B was his successor, and he had no feasible successor -- so would A have responded with infinity, or would have had to have asked C, D, E first?

Based on what I know, A would have transitiioned to acvitve for that network, and had to ask CDE first. That way, CDE would always repond to B first, because A has to send out his own queries.

Also, in step 8. B will reply with infinity to CDE rather than the new metric it learned from CDE in step 3? Is this because those new metrics were never used due to the network never transitioning to passive, or because the queries from CDE was changed to infinity after A queried CDE?

If it is true that you can reply to a query for a subnet, even when waiting for a reply for the same subnet, then there can be no deadlock situation, and Peter is correct in saying this example is not a SIA example.

Ruling out deadlock situations, SIA's would only occur with query ranges that exceed 6 minutes (huge diameter of a network),  or faulty links/connections (which SIA-Query and SIA-Reply help pinpoint -- these ensure that the network link with the problem is the one that is reset, not just the first RTR to send the query as the book suggests). Am I right in thinking this?

Again, thanks so much for this wonderful insight!

Correct Answer
Peter Paluch Mon, 02/15/2010 - 11:52

Hello,

You are welcome. Let's go over your individual comments.

I'm not sure why I always see 3 minutes as the limit to go to SIA. Yes, 3 minutes is the active timer, but this is reset at 90 second intervals with the SIA-Query and SIA-Reply packets (up to three times). So, assuming there are no bad links in the network, or faulty devices, 6 minutes will elapse prior to going to SIA.

Yes, you are correct - the same fact is explained in the Jeff Doyle's book Routing TCP/IP, Volume 1.

I follow Peter's logic, and it makes sense (other than not following split horizon rules ).

Yep I completely skipped the split horizon rules - they only optimize the query process somewhat but do not influence it in any significant way. And also, I forgot completely that the EIGRP uses split horizon with queries I owe you one!

The only question I had was in step 3, and the point you make of CDE answering first being a coincidence. Let's say that A reponds prior to C, D, E. What would A's reply be? B was his successor, and he had no feasible successor -- so would A have responded with infinity, or would have had to have asked C, D, E first?

A very good question. The router A cannot send a reply immediately because the B is its current successor and by its query, it indicates it has problems reaching that network. More precisely, the A will receive the query from B, indicating the current distance of B towards the network 10.1.8.0/24, which is infinity. For router A, the router B was the successor and there are no feasible successors because the C, D and E also decided to use A as their next hop towards the 10.1.8.0/24. Therefore, upon receiving the query from B, router A will lose its successor and will need to forward the query itself before it can provide a reply to B. The remaining process will follow approximately from the step 5.

Also, in step 8. B will reply with infinity to CDE rather than the new metric it learned from CDE in step 3? Is this because those new metrics were never used due to the network never transitioning to passive, or because the queries from CDE was changed to infinity after A queried CDE?

In step 8, the router B must reply with infinity to C, D and E. The reason is that in DUAL algorithm, once a router transitions to the active state, it must not change the current successor, feasible distance or reported distance until it receives all replies and selects a new successor. This is one of the most important principles with DUAL - a router can change its idea about distances and next hops only after it receives all replies because only then it can be sure that the remaining part of the network has already updated its routing tables and distances correctly (by the virtue of repeating the same principle). So, in this case, while the router B has received the replies from C, D and E, it did not receive all replies it needs to make a qualified decision and cannot be sure that the neighbors are already aware of the most recent correct route to the destination.

I suggest strongly reading the following document:

http://www.cisco.com/en/US/tech/tk365/technologies_white_paper09186a0080094cb7.shtml

Regarding the behavior of receiving and sending updates, see this section:

http://www.cisco.com/en/US/tech/tk365/technologies_white_paper09186a0080094cb7.shtml#queryrange

You are welcome to ask further!

Best regards,

Peter

tdistlists Mon, 02/15/2010 - 14:44

Wow, you're the best! Thanks so much!

Sounds to me now that deadlock situation's cannot occur.  SIA would then only be possible with a super large query diameter and/or with faulty links/etc.

I just spent the last two hours reading the document you linked, and it definitely clarifies things. Sounds to me like the Ciscopress book should have included that chart:

When a router processes a query from a neighbor, the following rules       apply:

Query from

Route state

Action

neighbor (not the current successor)

passive

reply with current successor information

successor

passive

attempt to find new successor; if successful, reply with new                 information; if not successful, mark destination unreachable and query all                 neighbors except the previous successor

any neighbor

no path through this neighbor before query

reply with best path currently known

any neighbor

not known before query

reply that the destination is unreachable

neighbor (not the current successor)

active

if there is no current successor to this destinations (normally                 this would be true), reply with an unreachable

if there is a good successor, reply with the current path                 information

successor

active

attempt to find new successor; if successful, reply with new                 information; if not successful, mark destination unreachable and query all                 neighbors except the previous successor

You're point below clears up a lot as well.

The reason is that in DUAL algorithm, once a router transitions to the active state, it must not
change the current successor, feasible distance or reported distance
until it receives all replies and selects a new successor. This is one
of the most important principles with DUAL - a router can change its
idea about distances and next hops only after it receives all replies
because only then it can be sure that the remaining part of the network
has already updated its routing tables and distances correctly (by the
virtue of repeating the same principle).

So what "was" the best I can't change until I get all the query replies.

In reality, if a router is in active state for a destination and
subsequently receives yet another query, it immediately responds with
its current metric that was set in the moment of transition to the
active state (note that this metric must be higher than what it was
when the router was passive because a transition into the active state
is always a result of distance increase).

However, the "metric" I send out in the query packet is either infinity (unreachable) if I had no other entries in my topology database, or the next best path if I did have non-FS entries in my topology database.

Thanks again so much!

I did have another DUAL question that I'm typing up. I'll put it in a separate thread as it's not SIA related, but basically my question was why is the feasibility condition AD < FD as opposed to AD <= FD. In JJ Aceves' original DUAL documents he had <=, but somewhere along the lines the = part got dropped.

Thanks!!

Peter Paluch Mon, 02/15/2010 - 15:22

Hello,

You are more than welcome.

Regarding that table from the Cisco document - it slightly differs from the DUAL operation described in the paper by Dr. Garcia. It seems that the Cisco team behind EIGRP has somewhat optimized the basic DUAL algorithm but not radically.

So what "was" the best I can't change until I get all the query replies.

Exactly. The EIGRP basically behaves in a similar way to transactions in relational databases - it moves from a consistent solution to another consistent solution. No inconsistent (i.e. potentially looped) temporary states are allowed, not even in transitory periods.

However, the "metric" I send out in the query packet is either infinity (unreachable) if I had no other entries in my topology database, or the next best path if I did have non-FS entries in my topology database.

Yes, very correct, and by the way, note how this differs from the simplified explanation that is almost omnipresent - that if a successor fails, you just jump onto a feasible successor and start using it. In reality, you can have a feasible successor to a destination available with some metric, and at the same time, you can have a neighbor whose RD is not strictly lower than your current FD  (so it does not meet the feasibility condition) but at the same time, the path through that neighbor is the next shortest. Now, if a successor fails, what happens? The common literature says that you simply start using the feasible successor. But that would mean that you will stick with a longer route even though there might be a shorter path available. So in reality, you will note in your topology table that the next shortest path goes via the neighbor that is not identified as feasible successor, so you go to the active state to ascertain whether that neighbor can be the next successor, and decide after the diffusing computation finished.

I'll put it in a separate thread as it's not SIA related, but basically my question was why is the feasibility condition AD < FD as opposed to AD <= FD. In JJ Aceves' original DUAL documents he had <=, but somewhere along the lines the = part got dropped.

I assume you are referring to this paper:

http://www.soe.ucsc.edu/research/ccrg/publications/jj.dual.ton93.pdf

The <= inequality was used with different sufficient conditions to guarantee loop free operation. However, the Source Node Condition upon which the DUAL is built requires a strict inequality, and it is always present that way even in the paper. I am thinking right now of a counterexample to show how would the <= type of inequality lead to a loop with the Source Node Condition but I cannot currently come up with anything clever. Perhaps I'll be more fresh in the morning... it's after midnight here in Slovakia right now.

Best regards,

Peter

tdistlists Mon, 02/15/2010 - 16:30

Well, rest or no rest you seem to be on top of your game!

I assume you are referring to this paper:

http://www.soe.ucsc.edu/research/ccrg/publications/jj.dual.ton93.pdf

The <= inequality was used with different sufficient conditions to guarantee loop free operation. However, the Source Node Condition upon which the DUAL is built requires a strict inequality, and it is always present that way even in the paper. I am thinking right now of a counterexample to show how would the <= type of inequality lead to a loop with the Source Node Condition but I cannot currently come up with anything clever. Perhaps I'll be more fresh in the morning... it's after midnight here in Slovakia right now.

Best regards,

Peter

I can start a new thread if you wish, but I'll just post my thoughts here for now.

AD=RD (advertised distance and reported distance are interchangable below)

Why does the RD have to be strictly less than the FD to be a Feasible  successor? (RD < FD). All the documentation and books clearly provide examples of loops when AD >  FD, and explain that its guaranteed to be loop-free if AD < FD -- because an  AD lower than my FD means he is closer than I have ever been.

However,  I can't find an example of a problem when AD = FD. That's what it boils down to  for me.  To reiterate the question. Why is the feasibility condition AD < FD  and not AD <= FD?

In the research I've been doing, I've come across multiple papers of J.J. Garcia-Luna-Aceves. In reading his work on DUAL, EDVA  (enhanced distance vector algorithm) and LPA (loop free path finding algorithm)  -- Aceves often refers to RD <= FD as being an acceptable condition for  loop-free detection.

A UNIFIED APPROACH TO

LOOP-FREE  ROUTING USING
DISTANCE VECTORS OR LINK STATES

Source: http://www.loria.fr/~ichris/Teaching/p212-garcia-luna-aceves.pdf

Page  215-218


If we move down further, he references an example of DUAL which  actually utilizes a feasible successor with an RD = FD. The example is when he  refers to node C, because the distance from D->E (RD) and from F->E (old  FD) are equal.


However, in reading some of this later work, it has something  called a "RFC" or Revised Feasible Condition.

Source: http://ccrg.soe.ucsc.edu/publications/zhengyu.milcom97.pdf
Page  2-5




I cannot find any documentation as to why the equal-part of the  equation was dropped -- other than optimization and altered query/update  process, with claims of faster convergence.

See ection 5.1 of the above document, where Aceves quotes

"Accordingly, because FDji = 2 < Djn = 5, FC is not
satisfied, and node i must send a query before it can
choose a new feasible successor. It is clear that RFC saves
queries and consequently reduces the convergence time."

In Aceves' EIGRP document  (http://www.ida.liu.se/~TDTS02/papers/eigrp.pdf) he simply states factually that RD must be < FD -- but  doesn't show or enlighten the reader as to why this is the case. I cannot come  up with a scenario where RD = FD could cause a loop if installed as the  FS.

Many networks have many RD's equal to a routers  FD -- and they are not installed as FS. It seems inefficient in every situation to query/update rather than installing the FS.

In the document you referenced, Aceves does break it down into three conditions for loop freedom, DIC: Distance increase condition, CSC: Current successor condition, and SNC: Source node condition. You are correct that only SNC (which the definition baffles me a bit, other than meaning finding a new successor) drops the <= type.

I hope I didn't ramble on too much, but it perplexes me to find a loop with this last condition. Perhaps it was introduced just as a more efficient query/update process.

Thanks again so so much!

Message was edited by: tdistlists

- Image insertion isn't working, so I added them as attachments.

Attachment: 

Actions

This Discussion