02-13-2010 09:06 PM - edited 03-06-2019 09:42 AM
Hey all,
I have a question regarding SIA, as explained on page 122-124 in "Authorized Self-Study Guide for BSCI" 3rd edition. Luckily this can be found direct from ciscopress.com as a sample chapter (http://ptgmedia.pearsoncmg.com/images/9781587052231/samplechapter/2237_SampleChapter_03.pdf)
Basically, they give a scenario where RTR B has a network that goes down (10.1.8.0/24). So he queries A,C,D,E.
CDE answer that they have a good route via A because A was their FS.
A, upon receiving the query from B, query CDE.
CDE, upon receiving the query from A, purge A as the new successor for that route, and query B.
B can't reply to CDE, because it is still waiting for a reply from A.
So, B is waiting for A before it can answer CDE.
A is waiting for CDE before it can answer B.
CDE is waiting for B before they can answer A.
The book doesn't involve the SIA-Query/SIA-Reply packets in this scenario, but ultimately this will delay the resetting of the peer between A and B by a maximum of 6 minutes (assuming the default 180 second acvite timer).
I find it hard to believe that resetting the peer between A and B (which impacts all networks learned between them both) is the optimal behavior for EIGRP to resolve this deadlock situation.
Why can't RTR B, upon reaching 3 minutes (or 6 if you include the SIA-Query/SIA-Reply packet) simply purge the 10.1.8.0/24 route instead of resetting the peer?
This seems much more efficient.
This topology is not too out-of-the ordinary, where you have one subnet hanging off of only one of the distribution routers due to cost limitations/etc, where it isn't feasible that the subnet have redundant connections. I just don't think that if this prefix goes down, it should end up in a neighbor reset to clear it all away.
All other examples imply a bad connection or a faulty link that leads to a reply not being returned. This scenario, however, is a deadlock situation where there is no faulty links (other than the original network going down).
I understand summarizaiton and stub can fix this all, but I just see those as band-aid solutions for what seems to be a flaw in EIGRP. Ultimately there should be RTR-IDs included in the SIA-Reply packet (who you are waiting to get a reply from), and with this extra bit of info, RTR B would have realized A is waiting for B, so that means the network is "unreachable" from A.
Did I miss something?
Thanks for any and all advice!
Solved! Go to Solution.
02-14-2010 03:06 AM
Hello tdistlists,
Is the same explanation to the exhibit, as you have quoted it here, also present in the Cisco Press book? If yes then the book is patently wrong on this subject.
The explanation in your post is not correct because it is based on a false assumption: It wrongly assumes that if a router is in active state for a destination, it cannot send a reply to another query that it receives during the active state.
In reality, if a router is in active state for a destination and subsequently receives yet another query, it immediately responds with its current metric that was set in the moment of transition to the active state (note that this metric must be higher than what it was when the router was passive because a transition into the active state is always a result of distance increase). This way, no deadlocks can occur, in contrary to what was described in your original post.
So the sequence of events would be as follows:
In particular, the SIA state will not appear in this network.
Best regards,
Peter
02-15-2010 11:52 AM
Hello,
You are welcome. Let's go over your individual comments.
I'm not sure why I always see 3 minutes as the limit to go to SIA. Yes, 3 minutes is the active timer, but this is reset at 90 second intervals with the SIA-Query and SIA-Reply packets (up to three times). So, assuming there are no bad links in the network, or faulty devices, 6 minutes will elapse prior to going to SIA.
Yes, you are correct - the same fact is explained in the Jeff Doyle's book Routing TCP/IP, Volume 1.
I follow Peter's logic, and it makes sense (other than not following split horizon rules ).
Yep I completely skipped the split horizon rules - they only optimize the query process somewhat but do not influence it in any significant way. And also, I forgot completely that the EIGRP uses split horizon with queries I owe you one!
The only question I had was in step 3, and the point you make of CDE answering first being a coincidence. Let's say that A reponds prior to C, D, E. What would A's reply be? B was his successor, and he had no feasible successor -- so would A have responded with infinity, or would have had to have asked C, D, E first?
A very good question. The router A cannot send a reply immediately because the B is its current successor and by its query, it indicates it has problems reaching that network. More precisely, the A will receive the query from B, indicating the current distance of B towards the network 10.1.8.0/24, which is infinity. For router A, the router B was the successor and there are no feasible successors because the C, D and E also decided to use A as their next hop towards the 10.1.8.0/24. Therefore, upon receiving the query from B, router A will lose its successor and will need to forward the query itself before it can provide a reply to B. The remaining process will follow approximately from the step 5.
Also, in step 8. B will reply with infinity to CDE rather than the new metric it learned from CDE in step 3? Is this because those new metrics were never used due to the network never transitioning to passive, or because the queries from CDE was changed to infinity after A queried CDE?
In step 8, the router B must reply with infinity to C, D and E. The reason is that in DUAL algorithm, once a router transitions to the active state, it must not change the current successor, feasible distance or reported distance until it receives all replies and selects a new successor. This is one of the most important principles with DUAL - a router can change its idea about distances and next hops only after it receives all replies because only then it can be sure that the remaining part of the network has already updated its routing tables and distances correctly (by the virtue of repeating the same principle). So, in this case, while the router B has received the replies from C, D and E, it did not receive all replies it needs to make a qualified decision and cannot be sure that the neighbors are already aware of the most recent correct route to the destination.
I suggest strongly reading the following document:
http://www.cisco.com/en/US/tech/tk365/technologies_white_paper09186a0080094cb7.shtml
Regarding the behavior of receiving and sending updates, see this section:
http://www.cisco.com/en/US/tech/tk365/technologies_white_paper09186a0080094cb7.shtml#queryrange
You are welcome to ask further!
Best regards,
Peter
02-13-2010 10:43 PM
Hi,
EIGRP Stuck in Active state means If the successor path is lost & there is no feasible successor path is available, router sends out query messages
on all EIGRP enable interfaces & tries to find out analternative path to the network. It is active state for that route.
Now Router is waiting for reply from its neighbors. If reply is missing for 3 min, dat means router didn't get any reply from neighbors, then it becomes stuck in active.In this case, router reset the neighbor relationship with the router who didn't replied back the query messages sent by the router.This is the normal behavior of EIGRP protocol when ever it is in stuck in active stage.
To solve this problem, two method is used -
a) Router summarization
b) EIGRP Stub.
To disbale the stuck in active timer, the following command is used -
Router(config-router)# timers active-time disable
Hope to help !!
Ganesh.H
02-14-2010 02:00 AM
Hello Tdistlists,
EIGRP is the only protocol where a node involves its own neighbors in route calculations, using the Diffusive Update Algorithm (DUAL).
This has its own advantages and several drawbacks like the SIA problem.
EIGRP is also an hybrid protocol that, starting from distance vector roots, introduces some link state characteristics.
For example the concept of EIGRP router-id is somewhat limited to the originator router field in IP external routes TLV data structures.
So an EIGRP hello does not contain the EIGRP router-id of the sender and so on.
You can easily verify this doing a sh ip eigrp neighbors between two devices connected with multiple links: they list as neighbors interface ip addresses not the EIGRP router-id.
A sh ip ospf neighbors provides the OSPF router-ids of neighbors.
EIGRP takes from link state protocol the concept of a neighbor state machine, it doesn't exchange updates with a neighbor until they reach a state in the neighbor state machine.
EIGRP doesn't take from LS the link state concept, RB when loses its connection to net 10.1.8.0/24 does not react as an OSPF or ISIS router, it tries to find out if some other node can reach the missing IP prefix so it starts these queries.
An OSPF router would just publish an updated router LSA with link state down for the interface that has ip address in 10.1.8.0/24.
So I'm not sure that SIA Query and SIA Reply contains the router-id field and if they provide enough feedback info so that the deadlock can be solved.
These two packets SIA Query and SIA reply help in distinguishing between alive neighbors and faulty link they provide a form of bidirectional detection.
In my personal opinion the most difficult behaviour to understand is the propagation of a Query from one EIGRP AS number to another AS.
This is something that could have been avoided.
The use of route filters, route summarization and stub EIGRP concept can help in limiting the query range solving stability problems of a scenario like the one in the figure and are part of a correct design.
In other words, it is left to network engineers to use the appropriate tools to build a working and stable solution when using EIGRP.
These tools are not workarounds but part of the protocol scalability can be reached only with control of Query range.
For these reasons EIGRP is seen as easy to configure, but with less scalability then link state protocols
Hope to help
Giuseppe
02-14-2010 03:26 AM
Hello Giuseppe,
Allow me a few comments.
EIGRP is also an hybrid protocol that, starting from distance vector roots, introduces some link state characteristics.
It is often stated that the EIGRP is a hybrid but personally I disagree strongly with it and whenever I hear it I can feel my blood pressure rising It is true that EIGRP came with enhancements we were used to see in link-state protocols: using hello packets to maintain adjacencies, sending only incremental and event-triggered updates, maintaining a state machine in communication with neighbors - you've named them. This all is true but it does not make a routing protocol a hybrid yet. A true hybrid protocol would discover a part of network in its topological detail and the other part of network would be discovered by a distance vector principle, just like multi-area OSPF. In fact, the multi-area OSPF is the only true hybrid protocol used in general networks (there are some similar hybrids used in ad-hoc wireless networks as far as I know). The EIGRP, as fancy as it may seem, is only a distance vector, even though on steroids
In my personal opinion the most difficult behaviour to understand is the propagation of a Query from one EIGRP AS number to another AS.
Hmm... I have seen the description of this in the following document:
It does not strike me as being particularily difficult but I may be overlooking something.
For these reasons EIGRP is seen as easy to configure, but with less scalability then link state protocols
I think that estimating the real scalability of a routing protocol, especially when comparing it to other protocols of a different nature, is a difficult thing to do. The concept of areas in IS-IS and OSPF helps very strongly to maintain their scalability to large networks. On the other hand, in EIGRP, you can summarize (and thereby limit the query scope) on any router in the topology while in link state protocols, the summarization can be done only on ABRs and ASBRs. To me, both link-states and EIGRP appear as scalable enough but every one of them approaches it in its own way.
Best regards,
Peter
02-14-2010 03:06 AM
Hello tdistlists,
Is the same explanation to the exhibit, as you have quoted it here, also present in the Cisco Press book? If yes then the book is patently wrong on this subject.
The explanation in your post is not correct because it is based on a false assumption: It wrongly assumes that if a router is in active state for a destination, it cannot send a reply to another query that it receives during the active state.
In reality, if a router is in active state for a destination and subsequently receives yet another query, it immediately responds with its current metric that was set in the moment of transition to the active state (note that this metric must be higher than what it was when the router was passive because a transition into the active state is always a result of distance increase). This way, no deadlocks can occur, in contrary to what was described in your original post.
So the sequence of events would be as follows:
In particular, the SIA state will not appear in this network.
Best regards,
Peter
02-15-2010 09:00 AM
Thank you everyone for your reponses. A few things.
I'm not sure why I always see 3 minutes as the limit to go to SIA. Yes, 3 minutes is the active timer, but this is reset at 90 second intervals with the SIA-Query and SIA-Reply packets (up to three times). So, assuming there are no bad links in the network, or faulty devices, 6 minutes will elapse prior to going to SIA.
I know about DUAL, etc, and that rtr-IDs are mainly link-state things, but I was just trying to make suggestions on how to resolve the issue, where I consider summary, stub and filtering band-aid solutions.
However, it looks like my understanding was incorrect, according to Peter.
The book does state this:
BCSI Self-Study, pg 123-124
Router B learns from these queries that none of the remote routers has a path to network 10.1.8.0/24,
but it cannot respond that it does not know of a path, because Router B is waiting for Router A to reply to a query. Router A is waiting for either Router C, D, or E to reply to its query, and these remote sites are waiting for Router B to reply to their queries. Because Router B sent out the first query, its SIA timer expires first, and Router B reaches the SIA state for network 10.1.8.0/24 first
(in 3 minutes by default).
Unless I misunderstood the book, it basically says there is a deadlock situation because no one can answer anyone else due to not receiving an answer from another RTR.
I follow Peter's logic, and it makes sense (other than not following split horizon rules ). The only question I had was in step 3, and the point you make of CDE answering first being a coincidence. Let's say that A reponds prior to C, D, E. What would A's reply be? B was his successor, and he had no feasible successor -- so would A have responded with infinity, or would have had to have asked C, D, E first?
Based on what I know, A would have transitiioned to acvitve for that network, and had to ask CDE first. That way, CDE would always repond to B first, because A has to send out his own queries.
Also, in step 8. B will reply with infinity to CDE rather than the new metric it learned from CDE in step 3? Is this because those new metrics were never used due to the network never transitioning to passive, or because the queries from CDE was changed to infinity after A queried CDE?
If it is true that you can reply to a query for a subnet, even when waiting for a reply for the same subnet, then there can be no deadlock situation, and Peter is correct in saying this example is not a SIA example.
Ruling out deadlock situations, SIA's would only occur with query ranges that exceed 6 minutes (huge diameter of a network), or faulty links/connections (which SIA-Query and SIA-Reply help pinpoint -- these ensure that the network link with the problem is the one that is reset, not just the first RTR to send the query as the book suggests). Am I right in thinking this?
Again, thanks so much for this wonderful insight!
02-15-2010 11:52 AM
Hello,
You are welcome. Let's go over your individual comments.
I'm not sure why I always see 3 minutes as the limit to go to SIA. Yes, 3 minutes is the active timer, but this is reset at 90 second intervals with the SIA-Query and SIA-Reply packets (up to three times). So, assuming there are no bad links in the network, or faulty devices, 6 minutes will elapse prior to going to SIA.
Yes, you are correct - the same fact is explained in the Jeff Doyle's book Routing TCP/IP, Volume 1.
I follow Peter's logic, and it makes sense (other than not following split horizon rules ).
Yep I completely skipped the split horizon rules - they only optimize the query process somewhat but do not influence it in any significant way. And also, I forgot completely that the EIGRP uses split horizon with queries I owe you one!
The only question I had was in step 3, and the point you make of CDE answering first being a coincidence. Let's say that A reponds prior to C, D, E. What would A's reply be? B was his successor, and he had no feasible successor -- so would A have responded with infinity, or would have had to have asked C, D, E first?
A very good question. The router A cannot send a reply immediately because the B is its current successor and by its query, it indicates it has problems reaching that network. More precisely, the A will receive the query from B, indicating the current distance of B towards the network 10.1.8.0/24, which is infinity. For router A, the router B was the successor and there are no feasible successors because the C, D and E also decided to use A as their next hop towards the 10.1.8.0/24. Therefore, upon receiving the query from B, router A will lose its successor and will need to forward the query itself before it can provide a reply to B. The remaining process will follow approximately from the step 5.
Also, in step 8. B will reply with infinity to CDE rather than the new metric it learned from CDE in step 3? Is this because those new metrics were never used due to the network never transitioning to passive, or because the queries from CDE was changed to infinity after A queried CDE?
In step 8, the router B must reply with infinity to C, D and E. The reason is that in DUAL algorithm, once a router transitions to the active state, it must not change the current successor, feasible distance or reported distance until it receives all replies and selects a new successor. This is one of the most important principles with DUAL - a router can change its idea about distances and next hops only after it receives all replies because only then it can be sure that the remaining part of the network has already updated its routing tables and distances correctly (by the virtue of repeating the same principle). So, in this case, while the router B has received the replies from C, D and E, it did not receive all replies it needs to make a qualified decision and cannot be sure that the neighbors are already aware of the most recent correct route to the destination.
I suggest strongly reading the following document:
http://www.cisco.com/en/US/tech/tk365/technologies_white_paper09186a0080094cb7.shtml
Regarding the behavior of receiving and sending updates, see this section:
http://www.cisco.com/en/US/tech/tk365/technologies_white_paper09186a0080094cb7.shtml#queryrange
You are welcome to ask further!
Best regards,
Peter
02-15-2010 02:44 PM
Wow, you're the best! Thanks so much!
Sounds to me now that deadlock situation's cannot occur. SIA would then only be possible with a super large query diameter and/or with faulty links/etc.
I just spent the last two hours reading the document you linked, and it definitely clarifies things. Sounds to me like the Ciscopress book should have included that chart:
When a router processes a query from a neighbor, the following rules apply:
Query from | Route state | Action |
---|---|---|
neighbor (not the current successor) | passive | reply with current successor information |
successor | passive | attempt to find new successor; if successful, reply with new information; if not successful, mark destination unreachable and query all neighbors except the previous successor |
any neighbor | no path through this neighbor before query | reply with best path currently known |
any neighbor | not known before query | reply that the destination is unreachable |
neighbor (not the current successor) | active | if there is no current successor to this destinations (normally this would be true), reply with an unreachable |
if there is a good successor, reply with the current path information | ||
successor | active | attempt to find new successor; if successful, reply with new information; if not successful, mark destination unreachable and query all neighbors except the previous successor |
You're point below clears up a lot as well.
The reason is that in DUAL algorithm, once a router transitions to the active state, it must not change the current successor, feasible distance or reported distance until it receives all replies and selects a new successor. This is one of the most important principles with DUAL - a router can change its idea about distances and next hops only after it receives all replies because only then it can be sure that the remaining part of the network has already updated its routing tables and distances correctly (by the virtue of repeating the same principle).
So what "was" the best I can't change until I get all the query replies.
In reality, if a router is in active state for a destination and subsequently receives yet another query, it immediately responds with its current metric that was set in the moment of transition to the active state (note that this metric must be higher than what it was when the router was passive because a transition into the active state is always a result of distance increase).
However, the "metric" I send out in the query packet is either infinity (unreachable) if I had no other entries in my topology database, or the next best path if I did have non-FS entries in my topology database.
Thanks again so much!
I did have another DUAL question that I'm typing up. I'll put it in a separate thread as it's not SIA related, but basically my question was why is the feasibility condition AD < FD as opposed to AD <= FD. In JJ Aceves' original DUAL documents he had <=, but somewhere along the lines the = part got dropped.
Thanks!!
02-15-2010 03:22 PM
Hello,
You are more than welcome.
Regarding that table from the Cisco document - it slightly differs from the DUAL operation described in the paper by Dr. Garcia. It seems that the Cisco team behind EIGRP has somewhat optimized the basic DUAL algorithm but not radically.
So what "was" the best I can't change until I get all the query replies.
Exactly. The EIGRP basically behaves in a similar way to transactions in relational databases - it moves from a consistent solution to another consistent solution. No inconsistent (i.e. potentially looped) temporary states are allowed, not even in transitory periods.
However, the "metric" I send out in the query packet is either infinity (unreachable) if I had no other entries in my topology database, or the next best path if I did have non-FS entries in my topology database.
Yes, very correct, and by the way, note how this differs from the simplified explanation that is almost omnipresent - that if a successor fails, you just jump onto a feasible successor and start using it. In reality, you can have a feasible successor to a destination available with some metric, and at the same time, you can have a neighbor whose RD is not strictly lower than your current FD (so it does not meet the feasibility condition) but at the same time, the path through that neighbor is the next shortest. Now, if a successor fails, what happens? The common literature says that you simply start using the feasible successor. But that would mean that you will stick with a longer route even though there might be a shorter path available. So in reality, you will note in your topology table that the next shortest path goes via the neighbor that is not identified as feasible successor, so you go to the active state to ascertain whether that neighbor can be the next successor, and decide after the diffusing computation finished.
I'll put it in a separate thread as it's not SIA related, but basically my question was why is the feasibility condition AD < FD as opposed to AD <= FD. In JJ Aceves' original DUAL documents he had <=, but somewhere along the lines the = part got dropped.
I assume you are referring to this paper:
http://www.soe.ucsc.edu/research/ccrg/publications/jj.dual.ton93.pdf
The <= inequality was used with different sufficient conditions to guarantee loop free operation. However, the Source Node Condition upon which the DUAL is built requires a strict inequality, and it is always present that way even in the paper. I am thinking right now of a counterexample to show how would the <= type of inequality lead to a loop with the Source Node Condition but I cannot currently come up with anything clever. Perhaps I'll be more fresh in the morning... it's after midnight here in Slovakia right now.
Best regards,
Peter
02-15-2010 04:30 PM
Well, rest or no rest you seem to be on top of your game!
I assume you are referring to this paper:
http://www.soe.ucsc.edu/research/ccrg/publications/jj.dual.ton93.pdf
The <= inequality was used with different sufficient conditions to guarantee loop free operation. However, the Source Node Condition upon which the DUAL is built requires a strict inequality, and it is always present that way even in the paper. I am thinking right now of a counterexample to show how would the <= type of inequality lead to a loop with the Source Node Condition but I cannot currently come up with anything clever. Perhaps I'll be more fresh in the morning... it's after midnight here in Slovakia right now.
Best regards,
Peter
I can start a new thread if you wish, but I'll just post my thoughts here for now.
AD=RD (advertised distance and reported distance are interchangable below)
Why does the RD have to be strictly less than the FD to be a Feasible successor? (RD < FD). All the documentation and books clearly provide examples of loops when AD > FD, and explain that its guaranteed to be loop-free if AD < FD -- because an AD lower than my FD means he is closer than I have ever been.
However, I can't find an example of a problem when AD = FD. That's what it boils down to for me. To reiterate the question. Why is the feasibility condition AD < FD and not AD <= FD?
In the research I've been doing, I've come across multiple papers of J.J. Garcia-Luna-Aceves. In reading his work on DUAL, EDVA (enhanced distance vector algorithm) and LPA (loop free path finding algorithm) -- Aceves often refers to RD <= FD as being an acceptable condition for loop-free detection.
A UNIFIED APPROACH TO
LOOP-FREE ROUTING USING
DISTANCE VECTORS OR LINK STATES
Source: http://www.loria.fr/~ichris/Teaching/p212-garcia-luna-aceves.pdf
Page 215-218
If we move down further, he references an example of DUAL which actually utilizes a feasible successor with an RD = FD. The example is when he refers to node C, because the distance from D->E (RD) and from F->E (old FD) are equal.
However, in reading some of this later work, it has something called a "RFC" or Revised Feasible Condition.
Source: http://ccrg.soe.ucsc.edu/publications/zhengyu.milcom97.pdf
Page 2-5
I cannot find any documentation as to why the equal-part of the equation was dropped -- other than optimization and altered query/update process, with claims of faster convergence.
See ection 5.1 of the above document, where Aceves quotes
"Accordingly, because FDji = 2 < Djn = 5, FC is not
satisfied, and node i must send a query before it can
choose a new feasible successor. It is clear that RFC saves
queries and consequently reduces the convergence time."
In Aceves' EIGRP document (http://www.ida.liu.se/~TDTS02/papers/eigrp.pdf) he simply states factually that RD must be < FD -- but doesn't show or enlighten the reader as to why this is the case. I cannot come up with a scenario where RD = FD could cause a loop if installed as the FS.
Many networks have many RD's equal to a routers FD -- and they are not installed as FS. It seems inefficient in every situation to query/update rather than installing the FS.
In the document you referenced, Aceves does break it down into three conditions for loop freedom, DIC: Distance increase condition, CSC: Current successor condition, and SNC: Source node condition. You are correct that only SNC (which the definition baffles me a bit, other than meaning finding a new successor) drops the <= type.
I hope I didn't ramble on too much, but it perplexes me to find a loop with this last condition. Perhaps it was introduced just as a more efficient query/update process.
Thanks again so so much!
Message was edited by: tdistlists
- Image insertion isn't working, so I added them as attachments.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: