BGP routing loop?

Answered Question
Aug 13th, 2010

I have a possible scenario where i think we may get a routing loop due to BGP but I'm not 100% certain this will happen. As this is a theorectical design right now I can't test it but I wanted to propose it here first before labing it.

We have a customer who is multi-homed to 2 different AS's. They are prepending their network advertisements to AS"2" so to prefer AS"1" for incomming traffic.

They also are asking to change the Local Preference for the routes they advertise to AS"2" within AS"2" using a community $ that has been established by AS"2". This works. The customer advertises their networks to AS"2" with community $ 2:90 and this is matched by AS"2" and these routes are set to Local-Pref 90. Routes from AS"1" are recieved from AS"xxx" who peers with AS"1" and AS"2" and default to Local-Preference 100 on AS"2".

What this means now is that AS"2" is recieving prefered routes from AS"1" - via AS"xxx" and these would be prefered anyway due to the pre-pending and shortest match rule of AS_PATH attribute, however now that the Local-Pref is set within AS"2" it is this that is the initial decision maker as to the Route from AS"1" being prefered.

This all works fine and yes there really there is no need for this Local-Pref but that's not the question. The question is, is it just pointless to do this, or could it actually cause a problem.

Here's the proposed...

Given the above, let's say that now AS"1"'s path to AS"2" all of a sudden becomes 10 hops (someone killed half the internet), now when AS"xxx" is trying to get to the customers advertised networks, they would now see the advertisement from AS"2" as being less AS_Path hops, and thus preffered. Traffic would route to AS"2", but when it hits AS"2", AS"2" still has this Local-Preference set lower for the route it's learning from the customer and thus AS"1" route is prefered on AS"2", thus the traffic would route back to AS"xxx" and start looping in a blackhole.

Normal condition:

                      _______________AS"1"_____________________

                      /                                                                                  \

Customer AS                                                                              AS"xxx"

pre-pend x 5  \__local-pref 90___AS"2"_____local-pref 100_____/

Path to customer AS from AS"2" is via AS"xxx" as loca-pref 100 wins. Without local-pref path would be to AS"xxx" anyway due to AS_Path.

Fault condition

                                                     ______AS"n*10"_________       (as_path now 10 hops)

                       ______AS"1"____/                                             \

                      /                                                                             \

Customer AS                                                                              AS"xxx"

pre-pend x 5  \__local-pref 90___AS"2"_____local-pref 100_____/

Traffic from AS"xxx" to Customer AS networks would take the AS"2" route due to AS_PATH, but would they hairpin back at AS"2" due to the preference of the AS"xxx" path in Local-Preference ?

I'm thinking that this is not the case and there is something I'm missing here that would stop this but I can't put my finger on it.

I have this problem too.
0 votes
Correct Answer by Peter Paluch about 3 years 8 months ago

Hello Richard,

Actually, thanks to you asking further, I have discovered that I was incorrect in my original assesment of the situation. The ASxxx will continue using the longer path without going through AS2. I originally stated that the ASxxx will start using the AS2 to reach the customer - I was wrong. I apologize sincerely for misleading you. Please read further for the rationale behind all of this.

Let's break things down in simple steps. Before "the internet disaster", let's go over the sequence of steps as the network converges:

  1. The customer will advertise his networks both to AS1 and AS2. At first, both these ASes will use the direct route towards the customer because it is the only alternative known to them at this moment.
  2. Both AS1 and AS2 will advertise the customer networks to ASxxx. Because of AS_PATH manipulation in AS2, the route from ASxxx via AS1 will be preferred as the AS_PATH attribute will be shorter. The ASxxx will choose the best path through the AS1, and will advertise it to AS2.
  3. AS2 will learn that the path through ASxxx is better than the direct path towards the customer, and will modify its routing tables so that the traffic goes via ASxxx. Note, however, that in the AS2, the direct route to the customer will still remain in BGP database. It just will not be considered as the best path but it will still be present in the BGP database. Also please remember that BGP announces only the best route, not all routes in its BGP database.
  4. The network will thus converge on the path from AS2 via ASxxx via AS1 to the customer. Note that at this moment, the ASxxx does not consider AS2 to be a possible backup route towards the customer in its BGP database (the AS_PATH check on ASxxx will drop all updates from AS2 that indicate the ASxxx in their AS_PATH attribute).

Now, when the "internet disaster" ensues, the following will happen:

  1. ASxxx learns that the path to the customer has grown considerably in the number of AS_PATH elements. Nevertheless, at this moment it is still the only known way towards the customer so it will update its routing table accordingly and send an update to AS2 with the new information.
  2. AS2 will learn about the updated path whose AS_PATH has grown. However, because the local preference of the route through ASxxx is higher than the local preference of the direct way towards the customer, the path via ASxxx will again be chosen as the best path. The AS_PATH attribute will not be evaluated in this case, and the routing tables will not change.
  3. So even after the "internet disaster", thanks to the local preference manipulations in AS2, the AS2 will remain using the path over ASxxx. As a matter of fact, the higher local preference of routes through the ASxxx will result in AS2 constantly sending traffic through ASxxx, no matter what changes in the AS_PATH or other attributes may arise, as long as the ASxxx is reachable and the customer network is advertised from ASxxx to AS2.

As you can see here, introducing the local preference into the configuration actually changed the expected behavior - the AS2 will remain stuck to ASxxx as long as it gets, and as a result, ASxxx will never consider using the AS2 as a better route towards the customer.

If, however, the local preference was not modified, the sequence of events after the "internet disaster" would be as follows:

  1. ASxxx learns that the path to the customer has grown considerably in the  number of AS_PATH elements. Nevertheless, at this moment it is still  the only known way towards the customer so it will update its routing  table accordingly and send an update to AS2 with the new information.
  2. AS2 will learn about the updated path whose AS_PATH has grown. Now it knows about two possible ways towards the customer: the path via ASxxx with the longer AS_PATH, and the direct way towards the customer that was originally deemed worse because its AS_PATH was originally longer. However, now, the opposite is true: the AS_PATH of the direct route is shorter, and the AS_PATH of the route over the ASxxx is longer. The AS2 will therefore select the direct route as its new best path, it will modify its routing tables, and send an update to ASxxx.
  3. The ASxxx will receive this update and wil see that according to the AS_PATH size, the route via AS2 is shorter. It will therefore choose the path via AS2 as the new best path and modify its routing tables, and of course, it will send an update to its further neighboring ASes.

I hope this clarifies the things. In any case, you are welcome to ask further!

Best regards,

Peter

  • 1
  • 2
  • 3
  • 4
  • 5
Average Rating: 5 (1 ratings)
Peter Paluch Fri, 08/13/2010 - 15:22

Hello Richard,

The routing loop should not form. Note that in the case that "half of the internet get killed", you are stating quite correctly that the AS  "xxx" would use the AS 2 to reach the customer, obviously because of the shorter AS_PATH from AS "xxx" through AS 2 towards the customer.

However, in order for the AS 2 to loop back to AS "xxx", it requires that the AS 2 receives and accepts the route advertisement from AS "xxx". This is not possible because when AS "xxx" advertises the customer routes back to the AS 2, the AS_PATH attribute already contains the AS 2. Such a route would be immediately discarded by AS 2 routers without being processed or installed into the BGP database, so no loop can be formed.

Best regards,

Peter

richardboldy Sat, 08/14/2010 - 08:37

Thanks Peter,

Doesn't this mean that traffic will be backholed though as it will be sent to ASxxx but not accepted ?

Peter Paluch Sat, 08/14/2010 - 18:25

Hello Richard,

Don't confuse data traffic with routing updates. What I was talking about were the BGP routing updates. AS 2 will simply not accept any routing update that already traversed through AS 2, therefore it is impossible for AS 2 and AS xxx to mutually point to each other.

The data traffic will not be blackholed. If the AS xxx receives a routing update from AS 2 about the customer networks then the AS 2 already knows the path towards the customer. In other words, when AS 2 advertises the customer networks to its neighboring ASes, it is already ready to route the packets towards the customer.

Does this answer your question? Please ask further if there is any thing unclear!

Best regards,

Peter

richardboldy Sun, 08/15/2010 - 13:16

Thanks Peter,

I apreciate you taking the time to respond here.

I understand that when AS2 advertises the the route back to ASxxx it won't be accepted by ASxxx as it's own AS will be present in the AS_Path list.

However, when the traffic comes to AS2, due to the "half the internet died" issue. i.e. the route from AS2 is now a smaller AS_Path count than that advertised via AS1, the prefered route inside AS2 is still going to point to the recieved route from AS1, regardless of the AS_Path hop count as AS2 has set the local-preference on the route from the CustomerAS to 90 vs the route it is still getting from AS1 being 100.

I'm concerned that traffic will take the route to AS2 and then loop back to ASxxx when it hit AS2 due to this inconsitancy between the locally prefered route on AS2 and the rest of the internet as Local Pref is not transative over an eBGP peer. Obviously ASxxx will then try sending it back to AS2.

Thoughts?

Correct Answer
Peter Paluch Sun, 08/15/2010 - 15:04

Hello Richard,

Actually, thanks to you asking further, I have discovered that I was incorrect in my original assesment of the situation. The ASxxx will continue using the longer path without going through AS2. I originally stated that the ASxxx will start using the AS2 to reach the customer - I was wrong. I apologize sincerely for misleading you. Please read further for the rationale behind all of this.

Let's break things down in simple steps. Before "the internet disaster", let's go over the sequence of steps as the network converges:

  1. The customer will advertise his networks both to AS1 and AS2. At first, both these ASes will use the direct route towards the customer because it is the only alternative known to them at this moment.
  2. Both AS1 and AS2 will advertise the customer networks to ASxxx. Because of AS_PATH manipulation in AS2, the route from ASxxx via AS1 will be preferred as the AS_PATH attribute will be shorter. The ASxxx will choose the best path through the AS1, and will advertise it to AS2.
  3. AS2 will learn that the path through ASxxx is better than the direct path towards the customer, and will modify its routing tables so that the traffic goes via ASxxx. Note, however, that in the AS2, the direct route to the customer will still remain in BGP database. It just will not be considered as the best path but it will still be present in the BGP database. Also please remember that BGP announces only the best route, not all routes in its BGP database.
  4. The network will thus converge on the path from AS2 via ASxxx via AS1 to the customer. Note that at this moment, the ASxxx does not consider AS2 to be a possible backup route towards the customer in its BGP database (the AS_PATH check on ASxxx will drop all updates from AS2 that indicate the ASxxx in their AS_PATH attribute).

Now, when the "internet disaster" ensues, the following will happen:

  1. ASxxx learns that the path to the customer has grown considerably in the number of AS_PATH elements. Nevertheless, at this moment it is still the only known way towards the customer so it will update its routing table accordingly and send an update to AS2 with the new information.
  2. AS2 will learn about the updated path whose AS_PATH has grown. However, because the local preference of the route through ASxxx is higher than the local preference of the direct way towards the customer, the path via ASxxx will again be chosen as the best path. The AS_PATH attribute will not be evaluated in this case, and the routing tables will not change.
  3. So even after the "internet disaster", thanks to the local preference manipulations in AS2, the AS2 will remain using the path over ASxxx. As a matter of fact, the higher local preference of routes through the ASxxx will result in AS2 constantly sending traffic through ASxxx, no matter what changes in the AS_PATH or other attributes may arise, as long as the ASxxx is reachable and the customer network is advertised from ASxxx to AS2.

As you can see here, introducing the local preference into the configuration actually changed the expected behavior - the AS2 will remain stuck to ASxxx as long as it gets, and as a result, ASxxx will never consider using the AS2 as a better route towards the customer.

If, however, the local preference was not modified, the sequence of events after the "internet disaster" would be as follows:

  1. ASxxx learns that the path to the customer has grown considerably in the  number of AS_PATH elements. Nevertheless, at this moment it is still  the only known way towards the customer so it will update its routing  table accordingly and send an update to AS2 with the new information.
  2. AS2 will learn about the updated path whose AS_PATH has grown. Now it knows about two possible ways towards the customer: the path via ASxxx with the longer AS_PATH, and the direct way towards the customer that was originally deemed worse because its AS_PATH was originally longer. However, now, the opposite is true: the AS_PATH of the direct route is shorter, and the AS_PATH of the route over the ASxxx is longer. The AS2 will therefore select the direct route as its new best path, it will modify its routing tables, and send an update to ASxxx.
  3. The ASxxx will receive this update and wil see that according to the AS_PATH size, the route via AS2 is shorter. It will therefore choose the path via AS2 as the new best path and modify its routing tables, and of course, it will send an update to its further neighboring ASes.

I hope this clarifies the things. In any case, you are welcome to ask further!

Best regards,

Peter

richardboldy Sun, 08/15/2010 - 15:41

Exellent, thank you.

The key piece I was missing was that AS2(edit) would never advertise it's direct route to CustomerAS as it's only advertises the best route which would always be the route from AS2.

Thanks for working through this with me. This was a quite a comlex issue and although I knew there was something I wasn't considering it was only through talking it through here that it came clear.

FYI - I am AS2 and I will be advising our customer not to apply this local-preference as it would only be of use if the AS_Path becomes excessive. In such a scenario it would be benefical to utilize the inherant nature of BGP here to re-route traffic away from such an "internet disaster". So long as AS_Path is prepended correctly and with enough hops to cope with smaller issues this is surely the way to go.

Thnaks again for your quick responses here. I've very much enjoyed the discussion.

R./

Peter Paluch Sun, 08/15/2010 - 23:22

Richard,

The pleasure was mine. I have also enjoyed our discussion here very much.

Regarding that local preference - I agree with you, the customer should be advised against using it. The local preference is the second tiebreaker in the BGP best path algorithm (the first being the weight attribute) while the AS_PATH length is the fourth step. Thus, having a higher local preference will suffice to select the best path without taking the legth of AS_PATH into consideration. In your scenario, the resulting routing would not be optimal.

Thank you again for discussing this issue here on NetPro, and please feel yourself welcome to continue participating in the discussions here :-)

Best regards,

Peter

mahadev2529@red... Thu, 02/07/2013 - 22:13

Hello Richard,

My company Diagram below. both site is configured with e-BGP & PFR (MC/BR), problem is packet send on 1 link and receive 2nd link. how to resolve.

Mahadev Patil

CSCO10168280 Fri, 02/08/2013 - 05:45

Hi Mahadev,

From the little info you gave here it look like you've created yourself a nice layer2 loop. I don't think this is related to the thread so if you want more advice I suggest you create a new one if what I'm saying here doesn't help you.

If what you've drawn here is one giant layer2-broadcast domain with the LAN links on either side being in the same vlan and the MPLS links being p2p or VPLS layer2 then the behavior you mentioned is to be expected albeit not what you want.

What confuses me is you're saying you have eBGP here so maybe I'm missing something. Where is the layer2/3 boundary here. Are you peering eBGP with the service-providers here or are they just handing you a vlan or a layer2 port?

mahadev2529@red... Wed, 02/13/2013 - 21:20

Hi Richard,

This is the layer3 topology. Our company have use BGP routing protocol to communicate both ISP.

The PFR techonology use to load balancing traffic on both ISP.

CSCO10168280 Thu, 02/14/2013 - 05:37

Hey Mahadev,

I recommend that you start a new thread on this. You'll get more input from the rest of the folks here this way and right now I have limited time. Try explaining this in terms of packet flow with a specific example. Also if you post your router configs this will help but remove any passwords or interface descriptions that could be a security concern.

R./

Actions

Login or Register to take actions

This Discussion

Posted August 13, 2010 at 2:48 PM
Stats:
Replies:11 Avg. Rating:5
Views:2630 Votes:0
Shares:2

Related Content

Discussions Leaderboard