Cisco Support Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Hall of Fame Super Blue

BGP local pref

I understand what local pref does but i have always been slightly confused by one aspect of the topology it is used in.

If 2 IBGP peers are using local pref to influence which path to take is it best practice to have a direct physical link between the 2 routers or is it perfectly normal to have traffic being routed via the same interface it came in on eg.

1 L3 switch (SW1) shares a common subnet with WAN routers (R1 and R2). The WAN routers are using IBGP to peer.  Local pref has been setup so R1 is the preferred route for all networks learnt via EBGP.  Now i have seen a number of posts on CSC where the recommendation to influence the outgoing traffic is to use local pref. So SW1 can send traffic to either R1 and R2. If traffic goes to R2 that router will see the preferred path is via R1 and so it will have to reroute the traffic back out of the interface it was received on and send it to R1.

To my mind it would be far easier to redistribute the EBGP learned routes into OSPF (using type 1s from R1 and type 2s from R2) or EIGRP and influence the metrics. So SW1 always knows that R1 is the best path.

If there was physical dedicated connection between R1 and R2 that would make more sense to me.

I only ask because, as i say, i have seen a number of posts with a similiar setup where local pref was the recommendation i wondered whether i was missing something in terms of my understanding.

Any comments welcome.

Jon

22 REPLIES
Hall of Fame Super Silver

BGP local pref

Jon

While there may be advantages in having a direct link between IBGP peers it is certainly not a requirement. And to my understanding there are not any particular implications of using local preference about whether the IBGP connection is direct or multi hop.

My reaction to your question is to think that we may have 2 goals in designing our networks

1) we want routing to be as efficient as possible

2) we want to implement policy about how traffic is forwarded

I believe that sometimes 2 requires some compromise in 1.

HTH

Rick

Hall of Fame Super Blue

BGP local pref

Rick

Thanks for the response.

I'm struggling to see why a common recommendation is to use IBGP in the example i gave. And i think your'e example outlines the issue i was trying to get to the bottom  of  ie.

Use IBGP in the above example and you meet the requirements of 2)  but not 1)

Redistibute BGP into an IGP while influencing the metrics and you get 1) and 2)

So why would you use IBGP in the scenario i outlined. I appreciate there are many other sceanrios within SP clouds etc. but i was really only concerned with this one and why i have seen a number of posts recommending local pref as a solution to the above when, to my mind, there are clearly better ways.

I thought i must be missing something.

Jon

Hall of Fame Super Silver

BGP local pref

Jon

Let me then offer my opinion about the option to redistribute into the IGP vs using IBGP. If you are setting up a lab and your EBGP neighbor is advertising 20 prefixes to you then redistribution into the IGP is painless and makes some sense. But think about the Internet routing table. If you are running EBGP with an ISP who will advertise the entire Internet routing table, then what are the implications of redistributing that into your IGP? Do all of the routers inside your network have sufficient memory to maintain that many routes in their routing table? Do all of the routers inside your network have sufficient processing capability to maintain and search that big a routing table? Given the volitility of the Internet routing table do you want all of the routers inside your network to process all of the routing updates that are generated in the Internet?

I think it makes IBGP sound a bit better.

HTH

Rick

Hall of Fame Super Blue

Re: BGP local pref

Rick

That is a very good point. I must admit i was thinking primarily of an MPLS WAN where you were receiving EBGP routes from your own remote sites and probably summarised at that so redistribution would not be such a big issue.

But it does raise the question if you were receiving full routes then why would you want all traffic to go via R1 in the above scenario. Surely you would want to use both routers to make a best path forwarding decision. If R2 was only to be used for backup then why bother giving it full routes ?

Following on from that, lets assume there are too many routes to redistribute. Lets also assume you are not receving a default route othrewise you could simply redistribute both of those and influence the metric.

So, in your opinion what would be a better solution -

1) use IBGP and accept traffic will be rerouted from R2 to R1

or

2) use static default routes on SW1 together with IP SLA so that traffic is always sent to the right router and no traffic needs to be rerouted.

Personally i have always favoured using dynamic routing protocols because they react to a failure anywhere down the line whereas IP SLA depends on just what IP you are tracking.

I appreciate you taking the time to answer these questions, sometimes it's just good to discuss these things so i can get it clear in my head.

Edit - i don't know why, i just feel uncomfortable with traffic having to be rerouted back the way it came from. Perhaps it's just me.

Jon

Re: BGP local pref

Jon,

We don't even use an igp in our network. I run ibgp between my 2 edge routers and our l3 switch. The two routers have a local pref set for certain routes. My primary is set as 150 for all routes except those that should go out R2 (DR traffic), and the same in reverse from R2. The switch sends it in the appropriate direction based on that.

My question that I've always wondered is why engineers use bgp on the edge, and then redistribute into an igp. Don't you lose the quick failover capability when bgp loses its peering? For example, we're on ethernet, so when my peering goes down on the provider side, my interface is still up and bgp has to time out. I have my hold times set fairly low, so it's not super horrible, but if I didn't change that it would be 3 minutes. Let's say for ease that my hold times were at 3 minutes, my timer counts down, and bgp is still trying to get the peering back from a flap. Let's also assume that I'm redistributing into ospf. OSPF still has routes, but wouldn't they be blackholed for 3 minutes while bgp is trying to recover? This is one reason I've never moved to an igp on the lan because I believe that bgp just plays better together.

HTH,
John

*** Please rate all useful posts ***

HTH, John *** Please rate all useful posts ***
Hall of Fame Super Blue

Re: BGP local pref

John

That is another very good point. In my last job we had the BGP timers set very low just because of the very issue you raise.

Couple of questions -

1) My primary is set as 150 for all routes except those that should go out R2 (DR traffic), and the same in reverse from R2. The switch sends it in the appropriate direction based on that.

But how does the switch know which router to send traffic to ?  Is this a L3 switch ? So DR traffic, how does the switch know which is a DR subnet and that R2 should be used for that ?

2) Without having to look through the docs when the BGP peering is lost the interface is still up/up in your setup. So are are you saying that BGP immediately flushes it's routes ? I don't quite follow ie. if BGP still thinks the link is up and is redistributing into OSPF then even without redistribution the router will still have it's BGP routes so it would not use the other router.

I think i may be misunderstanding something here ?

By the way, sorry you keep getting 3s in the dynamic routing posts, i keep trying to readjust as you have been more than helpful.

Edit - ignore 1 because i just reread and see you run IBGP on your L3 switch. I should have read more carefully.

Jon

Re: BGP local pref

Jon,

No problem at all

2) Without having to look through the docs when the BGP  peering is lost  the interface is still up/up in your setup. So are are  you saying that  BGP immediately flushes it's routes ? I don't quite  follow ie. if BGP  still thinks the link is up and is redistributing  into OSPF then even  without redistribution the router will still have  it's BGP routes so it  would not use the other router.


BGP still has the routes in the routing table until the hold times expire, and then bgp would notice that the neighbor was down and remove the routes (unless it gets a notification before then). The problem is that, for example, if I was learning 192.168.1.0/24 from an ebgp peer and that peer went down, if I were redistributing the ebgp learned routes into ospf, ospf would see a route for 192.168.1.0/24 to the ospf neighbor that's also doing bgp/ospf redistribution. The neighbor has a down bgp peering, but it's not aware of it because the timers haven't expired, so that 192.168.1.0 route is still in the bgp and ospf table. I would think the pure ospf neighbor would still try to forward traffic until bgp timed out and would cause it to blackhole routes because bgp is down....man, that's confusing

HTH,
John

*** Please rate all useful posts ***

HTH, John *** Please rate all useful posts ***
Hall of Fame Super Blue

Re: BGP local pref

John

You're right it is confusing.

I guess what i was asking, and you may have answered but i am too dense to realise, is what makes IBGP any better in this instance ie.

BGP1 and BGP2 are EBGP edge routers.  SW1 is your switch.

1) BGP to OSPF - both routers are redistributing into OSPF and influencing metrics. BGP1 is preferred path for 192.168.1.0/24. BGP1 peering fails but because of the timers the route is still in the BGP table and so is redsitributed into OSPF. So traffic is blackholed. Totally understand that.

2) IBGP between all devices. Same scenario, BGP1 peering fails. It is the preferred path due to local pref. But the route is still in the BGP table and so is advertised to SW1. So SW1 sends to BGP1 and result is the same as 1).

Sorry to keep banging on about this, i just want to make sure i fully understand.

Jon

BGP local pref

Ah, I understand now....it was my fault for not understanding the question

You're right. Technically, you'd blackhole traffic either way. I guess it comes down to preference? Maybe Rick has more insight on that because I see where you're going now, and I don't think you're missing anything at all.

One other thing that I thought about though with local pref is that it's carried throughout your routing domain across all ibgp peers. Weight doesn't leave the local router, so those two attributes are really the only two preferred methods of influencing outbound traffic. Local-pref would be easier to let all routers know how to get outbound traffic to route vs weight would need to be done on all routers. A little easier on management I suppose.

HTH,
John

*** Please rate all useful posts ***

HTH, John *** Please rate all useful posts ***
Hall of Fame Super Blue

Re: BGP local pref

John

It just shows that there are so many approaches you can use to achieve the same thing. And i think the way you have done it is another option that i didn't consider ie. in my last post to Rick the options i outlined were -

1) use IBGP and accept traffic will be rerouted from R2 to R1

or

2) use static default routes on SW1 together with IP SLA so that traffic is always sent to the right router and no traffic needs to be rerouted.

It never occured to me to run IBGP on SW1 so it would automatically know using local pref which router to send it to.

I do get stuck sometimes on things that just seem wrong to me eg.

1) rerouting traffic out of the same interface

2) running IBGP on a L3 switch within a LAN

3) having a L3 link between 2 distribution switches which meant HSRP messages have to go via the access layer  switches (made pretty much redundant now with VSS).

Think i just need to be more flexible

Thanks for all your comments, both you and Rick.

Jon

Hall of Fame Super Silver

BGP local pref

Jon

My thoughts about what makes IBGP better than redistribution is that IBGP allows you to share the policy decisions between your EBGP routers. Lets assume (in reference to one of your previous comments) that you want to select some routes and prefer them on R1 and some other routes are preferred on R2. With IBGP the local preference can be advertised so that policy information is shared between the routers and traffic is routed the way that you prefer. And if the preferred router becomes unavailable then the other router automatically takes over.

It seems to me that in this kind of situation where you want some routes preferred through router A and other routes preferred through router B and you have an interior network then you have choices:

1) inject all of the route detail into the IGP so that all interior routers know precisely for every route whether to forward to A or to B. which means that all interior routers have large tables, lots of updates to process, etc.

2) hide the route detail from the interior, let A and B sort our which is preferred, and acknowledge that sometimes a packet from the interior is forwarded to A which will then be forwarded to B. (redirected)

I recognize that forwarding to A which then forwards the packet to B makes you uncomfortable. And it seems ineffecient. But as the network scales to large sizes, sometimes it is the best way to do things.

HTH

Rick

Hall of Fame Super Blue

BGP local pref

Thanks Rick. John, as you have probably read, added a further option when using local pref that makes more sense to me in terms of route selection ie. run it on the L3 core/distribution switch, SW1, in my example.

Like i said in my last post there are just some things i find uncomfortable but that doesn't mean they are wrong to do.

Many thanks for all your comments. Very interesting discussion.

Jon

BGP local pref

Yes indeed   I like to hear how others are doing it, because frankly you can't get every scenario from a book. You guys are very creative!

HTH, John *** Please rate all useful posts ***
Hall of Fame Super Silver

BGP local pref

I recognize that running IBGP on the switch is also an option. And in some environments (apparently including the one where John works) it makes sense to run IBGP on all the layer 3 devices. But that also suffers from the drawback that it means that all network devices will have all the routes in their routing table and it carries with it the requirements for large table handling, memory, high number of routing updates to process, etc.

One of the things that I believe is apparent in this discussion is that there are many alternatives. And that an alternative that is a good fit in organization A may not be a good fit in organization B.

HTH

Rick

Super Bronze

Re: BGP local pref

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Jon, you're of course correct, bouncing some traffic off one BGP egress router, to another BGP egress router, is less than optimal, but how nonoptimal is it?

If your two BGP peers are physically close, redirected traffic might be subjected to one "needless" routing hop (the nonoptimal initial BGP egress router) and perhaps one or two "needless" L2 hops (to get to a nearby peer BGP egress router).  How much additional latency (especially compared to end-to-end) and/or router load would this contribute?  Does the extra latency and/or CPU really cause us to work to have more optimal routing?

On the issue of sharing the same interface for redirected traffic, again you're correct, using a different interface, perhaps one dedicated between R1 and R2 would avoid this issue.  However, often the interior (LAN) interface has more available bandwidth than the exterior (WAN) interface?  If it does, there may be sufficient bandwidth to provide for redirected traffic too.

Sure you could redistribute BGP into your IGP.  However, perhaps setting iBGP on some of your interior routers would be better.  For example, maybe only SW1 needs to iBGP peer with R1 and R2.

PS:

BTW, setting up local prefs to manage path selection works, but it's so 20th century .  If possible, a 21th century approach might be used, such as using Cisco's PfR to manage path selection based on policies and actual path performance.  PfR modifies routes, so if you were using it to manage BGP, its route churn could impact your IGP if doing redistribution.  (And if you don't redistribute the churn, your IGP wouldn't be making optimal routing decisions.)

Hall of Fame Super Blue

BGP local pref

Joseph

I know you are right, the added latency of using high bandwidth links to hop back from one router to the other is minmal, but it just doesn't feel right. I know it's me but the idea of sending a packet to a router just for it to be sent back out the same inteface to get to the right router is just wrong  

But of course you are right and Rick alluded to the same thing. And there are alternatives which we have covered here ie. BGP to IGP and influence the metrics, IBGP on internal L3 switch. Personally i have only ever used BGP to IGP redistribution.

As for PfR  I have seen you refer to a few times in posts and every time i thought i should read up about it but then got sidetracked into something else. Maybe when i have finished with VSS and it's benefits with Campus design i'll get round to it.

Thanks for your comments.

Jon

Super Bronze

Re: BGP local pref

Disclaimer

The  Author of this posting offers the information contained within this  posting without consideration and with the reader's understanding that  there's no implied or expressed suitability or fitness for any purpose.  Information provided is for informational purposes only and should not  be construed as rendering professional advice of any kind. Usage of this  posting's information is solely at reader's own risk.

Liability Disclaimer

In  no event shall Author be liable for any damages whatsoever (including,  without limitation, damages for loss of use, data or profit) arising out  of the use or inability to use the posting's information even if Author  has been advised of the possibility of such damage.

Posting

Jon, I know exactly how you feel - letting the traffic bounce back off a router just feels wrong.  However, in the grand scheme it can be the more cost effective option, and again, depending on the situation, it actually might be minimally adverse to the traffic.

Another approach that hasn't been discussed, would be to redistribute BGP into a IGP for just that, not your "normal" IGP (i.e. two IGPs on select routers - IGPs might be different protocol, or same protocol different process, or different VRFs).  This is something unusual, but perhaps not all your interior IGP routers support BGP or you don't want/need the BGP routes on all of them. It also might avoid the peering issues related to dealing with more than a few iBGP routers.

Regarding OER/PfR, it's something I'm very taken with. Its improvement over "ordinary" dynamic routing, is somewhat akin to dynamic routing improvements over static routing.  Instead of just keeping track of best path(s) based on static metrics, it constantly tracks best paths for actual performance end-to-end and uses those for routing table updates.

I enabled OER within an international company across two L3 VPN MPLS clouds.  The only "problem" we then had, our performance monitoring tools stopped showing WAN cloud performance problems.    OER would "see" a performance issue and redirect traffic before our performance monitoring "saw" the same issue.

The catalyst (pun not intended ) for trying OER was one day one of our cloud providers had a node, in England, black hole transit traffic, but routing looked fine.  It took a while to figure out what the problem was and longer to work up temporary policies to not use one provider for some prefixes.

With OER, the same situation (and many others) is generally detected and dealt with, by OER, within a few seconds.

OER/PfR, I guess, might now also be considered an entry into the world of SDN.

PS:

Ah, VSS, dual member "stacking" for 6500s (and now 4500s).  Actually if you don't like bouncing "needlessly" through network devices, that's something you can easily bump into with VSS.  For L3, VSS can work against you.  For L2, I like it.

Hall of Fame Super Blue

BGP local pref

Joseph

I haven't been around on the forums for a while but i don't see many questions on PfR. This doesn't necessarily mean it isn't widely used but CSC can give you a good idea about how much a technology is in use.  Do you have any idea of how much it has been taken up by customers ?  Of course it could just be that it is so easy to configure that no one ever has to ask about it but i suspect that isn't the case

Ah, VSS, dual member "stacking" for 6500s (and now 4500s).  Actually if you don't like bouncing "needlessly" through network devices, that's something you can easily bump into with VSS.  For L3, VSS can work against you.  For L2, I like it.

Without wishing to turn this into a VSS thread can you elaborate on that point ie. why can L3 work against you.

Jon

Super Bronze

Re: BGP local pref

Disclaimer

The  Author of this posting offers the information contained within this  posting without consideration and with the reader's understanding that  there's no implied or expressed suitability or fitness for any purpose.  Information provided is for informational purposes only and should not  be construed as rendering professional advice of any kind. Usage of this  posting's information is solely at reader's own risk.

Liability Disclaimer

In  no event shall Author be liable for any damages whatsoever (including,  without limitation, damages for loss of use, data or profit) arising out  of the use or inability to use the posting's information even if Author  has been advised of the possibility of such damage.

Posting

Jon, I had noticed your absence, and your relatively recent return.  Very glad to see you back!

I suspect PfR hasn't had a lot of uptake.  If that's true, suspect it's because you need multiple "edge" paths, with an interior and exterior.  When you first look at it, it also seems very complex but once you begin to understand it, it's really not all the complex.  Unfortunately there's yet a Doyle book on the subject.

Why VSS can work against you, in L3, is because it's very feature hides better L3 paths.

Ideally VSS should have connections from each VSS member to all other devices.  But in my experience, often they don't, especially for L3.  For example, you might have a single "east" and "west" connection, each on a different VSS member.  From one L3 device out, optimal path might be to the VSS member with the "east" connection, but the next L3 device sees the VSS pair as a single device.  So, perhaps it sends its traffic to the "west" VSS member.  Now that VSS member will redirect the traffic to its VSS mate (basically we're back to your concern about suboptimal traffic flows, on eBGP egress routers, in this post).

This situation can also arise in failure modes.  Suppose a VSS member has a line card failure.  From an L3 perspective, an adjoining L3 device won't know not to avoid sending some of its traffic to that device in lieu of sending it directly to the VSS member with the remaining egress path.

Cross traffic in a VSS pair is something you want to avoid because often the bandwidth is limited (often just a dual 10g pair is being used - consider a pair of 3750Xs or 3850s often have more bandwidth for a dual stack) and if there's congestion on the VSL link, you cannot define custom QoS prioritization for it.  (Unlikely, but consider losing a 6900 line card [80 Gbps] whose egress now needs to be redirected to its VSS mate across the 20g[?] VSL.)

Another issue with VSS, a pair of L3 devices has a (very slightly) higher MTBF.  This because separate L3 devices don't share the single point of failure, the VSS OS.  (NB: the difference in MTBF for dual L3 vs. VSS is sort of akin to the difference between VSS vs. a single chassis with redundancy for everything [except, of course, the chassis itself].)

Yet another issue with VSS, a VSS member will always use their own local egress path; to avoid using the VSL.  With L3, I could make the local device's egress and it's peer egress equal cost (or if EIGRP use unequal cost) to take advantage of the peer's egress bandwidth.

The above isn't an indictment against VSS.  But I've found some engineers think VSS is a huge improvement vs. a fully redundant chassis or better than dual L3 in all aspects.

Hall of Fame Super Blue

Re: BGP local pref

Joseph

Jon, I had noticed your absence, and your relatively recent return.  Very glad to see you back!

Many thanks. I'm using CSC to get back up to speed and see just how much i have forgotten before i look for work next year. I have been out for a while so i'm trying to gauge how much i still know and whether i should continue in networking or look elsewhere

Thanks for the VSS pointers as well. Obviously i would never connect a L3 device to just one member chassis because of my aversion to suboptimal traffic paths

I understood everything except -

This situation can also arise in failure modes.  Suppose a VSS member has a line card failure.  From an L3 perspective, an adjoining L3 device won't know not to avoid sending some of its traffic to that device in lieu of sending it directly to the VSS member with the remaining egress path.

If the linecard fails doesn't the L3 device see the interface go down and immediately switch all it's traffic to the other chassis ?   If the L3 device had an intermediary switch in between it would have to wait for timers before realising it had lost it's peering but i was assuming a direct connection from the L3 device to the member chassis.

Jon

Super Bronze

Re: BGP local pref

Disclaimer

The   Author of this posting offers the information contained within this   posting without consideration and with the reader's understanding that   there's no implied or expressed suitability or fitness for any purpose.   Information provided is for informational purposes only and should not   be construed as rendering professional advice of any kind. Usage of  this  posting's information is solely at reader's own risk.

Liability Disclaimer

In   no event shall Author be liable for any damages whatsoever (including,   without limitation, damages for loss of use, data or profit) arising  out  of the use or inability to use the posting's information even if  Author  has been advised of the possibility of such damage.

Posting

If the linecard fails doesn't the L3 device see the interface go down and immediately switch all it's traffic to the other chassis ?   If the L3 device had an intermediary switch in between it would have to wait for timers before realising it had lost it's peering but i was assuming a direct connection from the L3 device to the member chassis.

Not the L3 device with a link to a failed card, but another L3 device with connections to both VSS members.

I.e

R1=VSS=R2

If VSS lose one of its connections to R2, R1 will still send traffic to both VSS members that transits R2.

vs.

R1-C1-R2

R1-C2-R2

If C1 loses its link to R2, R1 should only send traffic to C2.

Hall of Fame Super Blue

BGP local pref

Joseph

I understand now.

Thanks

Jon

566
Views
50
Helpful
22
Replies
CreatePlease to create content