BGP route-reflectors and MPLS - suboptimal path.

Unanswered Question
Nov 23rd, 2011

Hello everybody,

I'm quite lost and need some good advices about my network topology.

Please have a look at a  picture in the attachment.

We have 4 routers physicaly  connected in a ring, three of them have a eBGP session with a upsteam ISP.

Two RC-RR's are route-reflectors and all other routers have  BGP sessions with them using   Loopbacks IP as source. 

Because of speed and price the connection RC-E001 <--> RC-RR1 is a backup and OSPF and BGP metric are set accordingly.

The internal routing are working as expected.  All routers are MPLS "P" routers, but only  Loopbacks IP are label-switched, it means that only traffic to a Loopback follow label-path, other traffic should use normal routing table.

The problem is followin: Traffic to the Internet  from the router RC-E001 follows the path RC-E002 ---> RC-RR2 ---> RC-RR1,

but it should just go to the router RC-E002 and then directly to the Internet.  All external prefixes on RC-E001 have RC-RR1 as a next-hop (higher local-preference)

Traceroute on RC-E001 shows following:

RC-E001#traceroute 8.8.8.8    

  1 RC-E002 [MPLS: Label 202 Exp 0] 16 msec 20 msec 60 msec

  2 RC-RR2 [MPLS: Label 79 Exp 0] 20 msec 16 msec 20 msec

  3 RC-RR1 [AS UPSTREAM] 20 msec 16 msec 20 msec

  4 UPSTREAM [AS UPSTREAM] 20 msec 16 msec 20 msec

  5 ....

I understand that RC-E001 tries to reach  the BGP next-hop via MPLS label-path, bacause all Loopbacks should use MPLS Label path-switching, but I don't want that the traffic goes in such sub-optimal way.

What have I configured wrong and what should I do to force  the traffic  from RC-E001 goes out direct  from RC-E002?

Best regards,

Konstantin

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Average Rating: 3.2 (5 ratings)
svaibhava Wed, 11/23/2011 - 21:43

Hi Konstantin

The attachment is not showing any complete diagram but just a partial arrow..Can you please provide the updated diagram to better understand the network topology.

Are you using NHS on the RRs as by default NHS will not work for RR even though configured..If RC-E002 also has an upstream ISP Peering along with the RC-RR1 and RC-RR2 then for RC-E001 to choose RC-E001 the BGP attributes for the routes being injected by RC-E002 has to be better than RC-RR1/RR2 so that RC-RR1/RR2 in order to choose RC-E002 as the best route and do not announce their own routes..

Regards

Varma

Konstantin Dunaev Wed, 11/23/2011 - 23:44

oh, sorry, it was saved  only selected objects , here is the fully diagram.

What is "NHS" ?

svaibhava Thu, 11/24/2011 - 00:26

Hi Konstantin

Thanks for providing the diagram. By NHS I meant Next-Hop-Self.

If I undertand correctly looking at the topology depicted above RC-E002 is also RR-Client . Am I correct in my understanding. RC-E001 only peers with the RR's RC-RR1 and RC-RR2 right ?

In order for RC-E001 to prefer RC-E002 as the exit point for Internet Traffic we need to make the EC-R002 BGP routes more preferred than RC-RR1 so that RC-RR1 reflects the RC-E002's routes to RC-E001 and do not advertise its own routes. Same can be achieved by increaing LP of RC-E002's routes..

Hope this helps in your query.

Regards

Varma

Konstantin Dunaev Thu, 11/24/2011 - 00:52

Hi Varma,

the problem is not a BGP attributes, it's a LSP what makes a problem for me, see the Matthew's answer.

svaibhava Thu, 11/24/2011 - 01:02

Hi Konstantin

The Problem here is that RC-E002 is not the best candidate for Internet Traffic but instead RC-RR1 is advertising the best routes..LSP comes at 2nd place only after we select the best route from a peer to reach that peer..Even if we peer RC-E002 directly with RC-E001 it will not solve the issue as the default LP for routes learnt from RC-E002 will be 100 and whereas LP of routes learnt from RC-RR1 will be 150 which is better thereby selecting RC-RR1's routes for Internet Traffic.

Regards

Varma

Konstantin Dunaev Thu, 11/24/2011 - 01:10

hmm, I can't agree with that.

If there were no MPLS in network, then  traffic would be routed hop-by-hop.

It means  RC-E001 still sees RC-RR1 as hext-hop for all external prefixes, but on the way to it, traffic would  be routed by RC-E002 and RC-E002 would simply send the external traffic directly to Upstream. The LP doesn't play any role here because  RC-E001 has no BGP session with RC-E002 and doesn't get any prefixes from it.

But you're right - if I set the BGP session between RC-E001 and RC-E002 then I can set the LP accordingly. I should think about it, because it means a "small" changes in a design . RC-E002 was not supposed to be RR router, but may be it's a good idea to create a second level of RR sessions.

svaibhava Thu, 11/24/2011 - 01:20

Hi Konstantin

I would agree on this with you and Matthew for the MPLS Point adding to this issue. Actually this is the basis for deploying a BGP free core. I just overlooked that..

We can have two options for this :

1. Make RC-E002 RR-Client and set LP to 200 for RC-E002 routes so that RC-RR1 reflects RC-E002 as best path with next-hop set to RC-E002

2. Peer RC-E002 and RC-E001 directly and again keep LP to 200 for RC-E002 routes.

Regards

Varma

Konstantin Dunaev Thu, 11/24/2011 - 01:25

Ok, I see your point.

I've posted a couple minutes ago the idea of my new design with 2 level of RR, do you mean  it makes sence?

svaibhava Thu, 11/24/2011 - 01:36

Hi Konstantin

In my opinion when we look for any hirearchical topology and the top level serves the bottom level and the clients connect to the bottom level..

So I think best way would be to keep RC-E002 at the topmost level and make RC-RR1 and RC-RR2 as its clients and keep RC-E001 as before the client of RC-RR1 and RC-RR2..

In my personal opinion making a client peer to different levels of RRs won't help in anything extra from traffic transport perspective.

Regards

Varma

Konstantin Dunaev Thu, 11/24/2011 - 01:45

you've it a little bit  misunderstood.

Just to clarify - currently in our network we have RC-RR1 and RC-RR2 as Route-reflectors, all othe routers are route-reflector clients of those both routers. Not other way around.

maayre Thu, 11/24/2011 - 00:35

Hi Konstantin

NHS stands for next hop self, of you are using a transit/border router as a RR you might notice some of your next hops get trampled, you can manually fix this with a route-map setting the next hop if you wish

I can't see the diagram (using app) but I would provide good odds that this is it;

When the ingress LSR (the guy you are tracing from) does a lookup it sees the next hop recourses to a label switch path (LSP).

With MPLS forwarding the packet is not routed hop by hop. Actually the packet is put onto a predetermined path to the next hop router in this case.

When the packet arrives here the router sees an IP packet (PHP) and sends the packet on a new predetermined path towards the upstream.

Have a look at the next hop values, I'm confident you'll see this is causing the issue, as mentioned you can manipulate the next hop the RR is advertising to correct the issue.

Konstantin Dunaev Thu, 11/24/2011 - 00:50

Hi Matthew,

you're right about LSP, the RC-E001 use the LSP to reach the nex-hop and that is why the external traffic goes to RC-RR1.

But how should I change the Next-hop value - just to set the next-hop IP of RC-E002? I think it's not a best solution, because RC-E001 doesn't have any BGP session with RC-E002. It'll work for sure, but it breaks the design rules.

I was think to configure RC-E002  as router-reflector for RC-E001 and so create an additional Route-reflector level, because as I remember according to BGP best-practice traffic  from one route-reflector client should not pass the other route-reflector client.

Konstantin Dunaev Thu, 11/24/2011 - 01:21

if I create a second level of Route-Reflector sessions, may I set a Route-reflector client so it connects to RR routers form different levels?

In my case it would be like this, RC-E001 is a route-reflector client of RC-RR1 and RC-E002 and RC-e002 is a route-reflector client of RC-RR1 and RC-RR2?

maayre Thu, 11/24/2011 - 01:56

Hi Konstantin,

I'm at home now on a PC and can see the diagrams. I think the issue is just that the best route is via RR1 (and not RR2) hence the traffic gets there based on the best path to the loop0 based on OSPF (and as already mentioned MPLS sends the packet on a prederemined path).

You're asking how to make traffic leave your network to the internet on RR2, you have two options;

- don't use label switching (this may not be an option for you)

- change the BGP attributes so that the best route in the BGP table on E001 is the path via RR1

I think your current local preference scheme is what is causing the route via RR2 to be selected, you could choose to up the local preference to above 150 for just the default route (assuming that's the route you're using) to resolve this.

Note: Varma did say the second solution above earlier

You asked about hierachical route-reflectors also, I don't think this could help you solve the problem. You never mentioned there is an iBGP session between RR1 & RR2, is there? As long as it is iBGP with no RRClient that should be fine and each RR will learn the routes which the other RR learnt through eBGP (that sounds confusing but I'm sure you know what i mean).

If you don't think that's correct or doesn't solve the problem can you post some "show ip bgp 0.0.0.0" (or whatever the route is if you're not using default) on all the routers so we can see whats happening?

Konstantin Dunaev Thu, 11/24/2011 - 02:16

Hi Matthew,

of course RC-RR1 and RC-RR2 do have a iBGP session , as I said, they are our Route-refclectors and all other routers are clients of them.

I wanted to set hierachical route-reflectors because I don't want to set a third RR router, beacuse it would need to set a nes BGP session, it means more prefisex on each router and so on.

maayre Thu, 11/24/2011 - 02:10

On second read over your earlier discussions regarding the BGP local-preference scheme.

Best common practise would dicatate that routers in  the same BGP AS should never have routes with different attributes. Ie  you should never set local preference in the middle of your network for a  specific route and some routes have the old route and some have the  changed route.

As you are aware you should also have an iBGP full mesh or at least route-reflectors.

So what you should have is

RR1 iBGP to RR2, E001, E002 (last two are RRClients)

RR2 iBGP to RR1, E001, E002 (last two are RRClients)

Then  all you have to do is on the network borders (which in this case are  RR1 & RR2) just configure the local-preference on the route as it is  learnt inbound (whether from eBGP or redistribution of a static route).

This way all routers will have "congruent routing information" which is what's expected in terms of best common practises.

I'm not too sure about the other background of what  you're trying to do here but for example if you also intended some kind  of load balancing scheme that could be done too but I don't think it's  in the scope of what you were asking.

Anyway sorry for all the spam, let us know how you go after this one =)

HTH,

Matt

Konstantin Dunaev Thu, 11/24/2011 - 02:41

Matthew, you're right about iBGP sessions and BGP attributes, I'm  changing our  local-preference only on our eBGP sessions and all BGP attributes stay the same over all network on all routers.

The BGP sessions with Route-reflectors are configured with "weigth", so I can choose the primary and secondary path.

I just want to reach an optimal path, if you see the firsth picture, you can see that  traffic to external prefixes goes the whole network and left it only at RC-RR1 instead of RC-E002.

kishore.chennupati Thu, 11/24/2011 - 03:32

Hi All,

Please allow me to jump in here with your permission.

Konstatin,

Before I go into solution. let me ask you a design question and this will also have the solution within it for your problem.

Q. What happens if the link between RC-E002 <->RC-RR2   and RC-E001 <->RC-RR1 goes down?

   You are toast. Although  RC-E002 has ebgp with ISP it wont pass the default route to RC-E001 because there is no iBGP between RC-E002 and RC-E001. so literally RC-E001 will not be able to get to the internet. You see what I mean here.

Now, the solution to your problem

Make RC-E002 as a RR with a LP=200 and rest of routers its client. Now, if the above scenario occurs then RC-E001 will still be able to get to the internet.

This will also fix your original problem. Since RC-E002 is the RR with an LP =200 and in a full mesh , it will send that to the other 3 routers(RC-RR1, RC-RR2, RC-E001). Now , RC-E001 will ignore the LP=150 from RC-RR1 and route via RC-E002.

Even if you are using MPLS here it will still be the same as it will just create an LSP between  RC-E001 and RC-E002 between the loopbacks and follow that path.

In case RC-E002 dies, then RC-E001 will go to the internet via RC-RR1.

or if the link between RC-E002 and RC-E001 dies then RC-E001 will  take the path from RC-RR1<->RC-RR2<->RC-E002 to the internet.

Hope this answers your question. If any thing unclear, please let me know.

Regards,

Kishore

Message was edited by: Kishore Chennupati

Konstantin Dunaev Thu, 11/24/2011 - 03:46

Hi Kishore,

thank you for jumping in

to your question - I'm agree actually, but these links are completely separate, in different locations, different media and different HW. But I see what you mean.

To your suggestion, I'm agree that RC-E001 need somehow a BGP session with RC-E002, but as I said I don't want to put RC-E002 on the same route-reflector level as our main route-reflector router - RC-RR1 and RR-RC2, that is why I was thinking about second level of RR. But I'm not sure if I can/may set an iBGP session to a route-reflectors from different levels.

kishore.chennupati Thu, 11/24/2011 - 03:50

Hi Konstantin,

You can setup different levels of RR's. You can have RC-E001 and RC-E002 in one level and have RC-RR2 and RC-RR1 in another level and then have the RR's talk to each other but to solve your original problem you would still need RC-E002 as a RR.

Please see this below which has a diagram of how to connect different RR's within the same AS.

http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a00800c95bb.shtml#routereflectors

HTH

Kishore

Edit: Actually looking at your topo RC-E001 might have to be also made an RR -client and also an RR if you want to want multiple clusters. otherwise if RC-E002 dies then the RC-E001 wont be able to go out as it wil become isolated. A RR client in a cluster can have only RR within the same cluster. But if it wants to talk to differnet clusters then it needs to be a RR as well. 

Honestly, having multiple clusters in your topology might not be beneficial and it will make it more messy in my opinion. Please stick to the solution i provided in my prev post and have a flat network

Message was edited by: Kishore Chennupati

Konstantin Dunaev Thu, 11/24/2011 - 04:16

exactly, RC-E002 will be RR router but on the higher  level of RR topology like this:

the question is:  is it possible or is it allowed to set RC-E001 as route-reflector client of  RR routers RC-E002 and RC-RR1 which are on different levels? I'm not quite sure how the BGP information will be  exchange in this case.

kishore.chennupati Thu, 11/24/2011 - 04:27

The answer is NO. I explained that in my prev post in the Edit: message

Also  in your diagram you have RC-E002 as a RR-client and RC-E001 also as a client to this client. You can't have a RR client to another RR client.

What you can do though is to make RC-E002 an RR and then make RC-E001 its client and also and RR. Then you can have a RR <->RR relationship between RC-E001 and RC-RR1 and this is allowed. Does it make sense?

I had edited my prev post which talks about this concept of communication between clusters.

Let me give you another link which might help

http://blog.sazza.de/?cat=22

HTH

Kishore

maayre Thu, 11/24/2011 - 04:56

Kishore,

A RR can be a client of another RR, this is called "hierarchical RRs" and is used to scale in very large SP networks.

In an abstract sense there is no such a thing as a RR, just a feature you can use to turn off the iBGP split horizon rule per neighbor.

kishore.chennupati Thu, 11/24/2011 - 05:13

Matt,

A RR can be a client of another RR, this is called "hierarchical RRs" and is used to scale in very large SP networks.

Where did I mention that it cannot. please point the post to me

Regards

Kishore

Konstantin Dunaev Thu, 11/24/2011 - 05:26

I agree with you to some extent, but why do want to have RC-E001 as route-reflector and RC-E002 as it client and not other way around?

I should clarify that RC-E002 is truly backbone router and RC-E001 is let say more or less a "stub router".

Kishore Chennupati wrote:


Also  in your diagram you have RC-E002 as a RR-client and RC-E001 also as a client to this client. You can't have a RR client to another RR client.

I think Matt  points to this post, actually I'm disagree with this statements as well. I din't use a hierarchical RR  topology in a proctice but pretty all MPLS topologies somehow refers to hierarchical RR topology.

kishore.chennupati Thu, 11/24/2011 - 05:32

Konstantin,

Please read my prev post carefully my friend. Below is what I said.

" What you can do though is to make RC-E002 an RR and then make RC-E001 its client and also and RR. Then you can have a RR <->RR relationship between RC-E001 and RC-RR1 and this is allowed. Does it make sense?"

Konstantin Dunaev Thu, 11/24/2011 - 05:45
Konstantin,

Please read my prev post carefully my friend. Below is what I said.

" What you can do though is to make RC-E002 an RR and then make RC-E001 its client and also and RR. Then you can have a RR <->RR relationship between RC-E001 and RC-RR1 and this is allowed. Does it make sense?"

Kishore, 

that is exactly the point which I'm not quite sure I correctly understand you.

You mean I can set RC-E002 as RR router and in the same time it stays as
route-reflector client for RC-RR1 and RC-RR2? Do I understand you correctly?

Then I configure RC-E001 as a route-reflector client for RC-E002?


But how can I configure "RR relatioship" between RC-RR1 and RC-E001? Do you meanto  configure
a simple iBGP session between them? I don't think it's a good idea - we don't get a full-mesh there
if  RC-E002 (RR router for RC-E001) failed.
kishore.chennupati Thu, 11/24/2011 - 05:35

Konstantin,

You can use it but there are certain considerations you need to take into account while using clusters etc which I explained in my prev posts. I am happy to be corrected if I am wrong.

Konstantin Dunaev Thu, 11/24/2011 - 06:05

it's a veery good link, thank you.

it shows pretty the same topology as I would like to get, without RR-BGP session between RC-E001 and RC-RR1.

Actually now I understand why many ISP's are trying to separte Internet and MPLS backbones, at leaset on the logical-BGP level

maayre Thu, 11/24/2011 - 04:45

Hi Konstantin,

Yes, you can add a level of hierarchy to route reflects where one route reflector is the client of another BUT just because you can bend the rules of iBGP split horizon like this doesn't mean you should and I don't believe it is required, nor is it good practice.

It doesn't matter that there is no iBGP between E001 and E002, they should both be clients of the same RRs and the route will propagate via both RRs and back down. Unfortunately because these RRs also have their own route with better attributes the route will never get advertised, this is why setting the local preference was mentioned.

I'm sure your issue is with route selection on the route reflectors, ultimately there are many ways to solve any problem like this.

kishore.chennupati Thu, 11/24/2011 - 05:16

Matt,

It doesn't matter that there is no iBGP between E001 and E002,.....

I respectfully disagree here .It does matter if you read my scenario.  I mentioned a scenario in my first post. If that case ever  happens then you do need iBGP between E001 and E002

Konstantin Dunaev Thu, 11/24/2011 - 05:21

I don't think the problem is only BGP related, as you said previously, it's a LSP. you can see in trace:

Traceroute on RC-E001 shows following:

RC-E001#traceroute 8.8.8.8    

  1 RC-E002 [MPLS: Label 202 Exp 0] 16 msec 20 msec 60 msec

  2 RC-RR2 [MPLS: Label 79 Exp 0] 20 msec 16 msec 20 msec

  3 RC-RR1 [AS UPSTREAM] 20 msec 16 msec 20 msec

  4 UPSTREAM [AS UPSTREAM] 20 msec 16 msec 20 msec

  5 ....

that the traffic are label-switched to the next-hop and neither RC-E002 nor RC-RR2 don't "route" the packets  they just forwards them to RC-RR1

It's quite clear that in order to use RC-E002 as egress router for traffic from RC-E001 we need that RC-E001  is seeing  RC-E002 as next-hop, it means we need a BGP session between them.

The question is now - how to get it in a best way.

maayre Thu, 11/24/2011 - 05:38

@Kishore - I don't disagree with you at all, we are just talking about different topologies :)

@Konstantin - you don't need a BGP session between E001 & E002, the fact that we are adding sessions to solve the problem proves that either

- the BGP best route selection is not being controlled and the desired route is not advertised

- the BGP topology does not equate to a full mesh (or equivalent with RRs)

How about try this, let's simplify the network for now. Make only one router a RR, it can be any and will need a session to each of the othe three. Now have all routers inject routes into BGP and use local preference (at the ingress) to control the best route then view this on the RR.

The label switching behaviour is just a side affect of the best BGP route being the one you don't desire.

One you've got that workin with one RR you can make a second one for redundancy as described in my earlier post.

Konstantin Dunaev Thu, 11/24/2011 - 05:53

Matt,

we can completly throgh away the RC-RR2 fo example, it doesn't play any role here, there is still a link between RC-E002 and RC-RR1. 

Then we get that both RC-E001 and RC-E002  have only one BGP session with RC-RR1.

RC-E002 has an Upstream link, RC-E001 doesn't have

All external prefixes on RC-E002 have a upstream router IP as hext-hop IP

on RC-E001 all external prefixes have RC-RR1 as hext-hop IP, because there is only one BGP session.

RC-E001 should actually use RC-E002 as egress point, but  in my topology it uses RC-RR1 because MPLS forwards the traffic direct to RC-RR1 and RC-E002 doesn't make any routing desicion. 

Update:

Without MPLS the trffic from RC-E001 would be simply routed by the RC-E002  and sent directly to Upstream router.

serj@iptp.net Thu, 11/24/2011 - 06:44

Konstantin,

BGP advertise only one best path. This is true for RRs as well.

This means that RR client receives only best routes from RRs and it's RRs vision of routing table.

If RR has eBGP routes it prefers them over iBGP by default and all RR clients will use this routes.
You can find this draft useful http://tools.ietf.org/html/draft-walton-bgp-add-paths-06.

Konstantin Dunaev Thu, 11/24/2011 - 06:51

Hi Sergey,

it's correct, I know it, but the problem not in the best-path.  The problem is how  to reach the next-hop from RR-client in case of MPLS and in case of non-MPLS backbone.

In my topology I've broken one of  RR topology  best practice - never set  a BGP session from  RR-client to a RR-router over another RR-client. In case of non-BGP backbone it's not so obvious but MPLS backbone has showed me that this rule makes sence.

maayre Thu, 11/24/2011 - 15:05

In regards to your last post you're saying;

"The problem is how  to reach the next-hop from RR-client in case of MPLS and in case of non-MPLS backbone."

What we are trying to say is that one RRClient can't learn the next hop of the other RRClient due to BGP best path selection on the RR and that this need to be controlled so that the route with the next hop of the desired RRClient is advertised. Yes, the behaviour changes when you introduce MPLS, but actually this is BGPs fault and only working without MPLS because IP routing is decided hop by hop. Really the issue here is a failure in the BGP design, MPLS forwarding is just the victim protocol.

I think just about everyone on here is trying to give you more or less the same solution;

- I am saying control best path selection with LP

- Sergey is saying the route isn't advertised due to eBGP > iBGP (and hence is not best route)

- Kishore was asking you to move the RR to E002 which would just be another way to control the best path without setting LP

- Varma was talking about LP as well

How about this, prove us wrong. Post the "show ip bgp summary" and "show ip bgp " from each router and explain how the suboptimal routing is being caused by anything other than the BGP best path selection on the RR.

You haven't mentioned it but if you want the RR to be either of the current ones while allowing E001 to reach the Internet via E002 while maintaining that RR1 and RR2 still use their directly connected egress, you'll need to do something a little more complicated, either;

- change the node which acts as the RR to E002 so that the only path E001 learns in a stable topology is via E002

- change the node which acts as the RR to E001 so that E001 learns all paths and decides hopefully using lowest cost IGP metric or perhaps LP (it's dangerous to let BGP decide on its own if you have your own policy in mind)

Is this why you don't want to change BGP? So that you don't influence other routers egress?

serj@iptp.net Thu, 11/24/2011 - 15:58

Matthew,

You summarized this well!

Changing the router acting as RR can help with eBGP routes learned by RR but at the same time suboptimal routing issue will appear at other node.

If the real network is the same as in diagram I'd prefer full-mesh with

next-hop-self and loopback source.

kishore.chennupati Thu, 11/24/2011 - 16:18

I echo Matt. What time did yo go to bed last night? You at chatswood?

BTW. Konstantin , you opened another thread for the same issue in WAN Routing and Switching. Not sure what you mean by that.

Konstantin Dunaev Fri, 11/25/2011 - 01:09

I've opened t a discussion about second level of route-reflectors, just to have more opinions.

nother "cross-post" to this discussion should be ignored.

Konstantin Dunaev Fri, 11/25/2011 - 00:57

Hi Matt,

thank you, I'm really appritiate  your and answer and input all others .

 Yes, the behaviour changes when you introduce MPLS, but actually this is BGPs fault and only working without MPLS because IP routing is decided hop by hop. Really the issue here is a failure in the BGP design, MPLS forwarding is just the victim protocol. 

I see your point and I 100% agree with it,  MPLS is forwarding the traffic exactly in the way how  BGP want it, in my case unfortunately a little bit wrong.

How about this, prove us wrong. Post the "show ip bgp summary" and "show ip bgp " from each router and explain how the suboptimal routing is being caused by anything other than the BGP best path selection on the RR.

"sh ip bgp summ" an all routers shows  two  iBGP sessions with both RR's and with Upsteam router.

on RC-E001 only two  iBGP sessions with both RR's.

"show ip bgp 8.8.8.8"  on all routers shows  that the best-path is via Upstream router.

on RC-E001 it shows 2 paths - via RC-RR1 and RC-RR2, and as weight parameter is higher for BGP session with RC-RR1, it chooses as best path. But because of IGP metrics the  RC-RR1 is reachable via RC-E002 and not directly via backup connection. 


You haven't mentioned it but if you want the RR to be either of the current ones while allowing E001 to reach the Internet via E002 while maintaining that RR1 and RR2 still use their directly connected egress, you'll need to do something a little more complicated, either;

- change the node which acts as the RR to E002 so that the only path E001 learns in a stable topology is via E002

- change the node which acts as the RR to E001 so that E001 learns all paths and decides hopefully using lowest cost IGP metric or perhaps LP (it's dangerous to let BGP decide on its own if you have your own policy in mind)

Is this why you don't want to change BGP? So that you don't influence other routers egress?

Just to clarify our topology;

Our topology looks pretty simple and straightforward, the RC-RR1,RC-RR2 and RC-E002 (and 5 or 6 more routers)

are our backbone routers with upstream eBGP and at the same time they are "P" routers for our MPLS network.

As Full-Meshed is not really possible in our case (too many BGP session)  RC-RR1 and RC-RR2 were choose as route-reflector for Internet routing (MPLS route-reflectors are outside of scope of this discussion) because of their location and performance. All routers have a direct "physical" connection to both of RR's. All routers should primarily use their own upstream link for external communication.

sometimes ago we've added RC-E001 in our network, but RC-E001 doesn't have an upstream and it's more or less stub router but it still needs full-BGP table. it has a direct physical connection only to RC-E002 (primary link) and to RC-RR1 (secondary link because of price and bandwidth). 

I can't simply just move one RR to RC-E002, it means to re-configure 10 routers.

I don't think it's a good idea to put a third RR in a network - it will unnecessary increase the number of routing information on all routers.

I can't configure RC-E001  as Route-reflector, because it's more like a "stub" router

Thank you all again for contributing!

Konstantin Dunaev Mon, 11/28/2011 - 02:39

Hello everybody!

thank you again for the particvipoating in the discussion.

After considering all possibilities, I find that introducing a new level of Route-Reflectors is not the best idea - it will unnecessary complicate the configuration.

I find the idea with next-hop altering is not so bad, the question is what would be the best way to alter it?  Als Incoming on RC-E001 and set next-hop RC-E002 and next in a list RC-RR1,RC-RR2?

or is it better to change the outgoing BGP updates on RC-RR1 and RC-RR2?

Thank you for comments!

maayre Tue, 11/29/2011 - 00:07

Problem is if E002 goes down and the next hop doesn't change you will black hole traffic.

Only safe way to do it is on the advertising RR but you would need some kind of advertise/exist map to conditionally change the next hop.

Since you don't want to drastically alter the current design why not add an iBGP session between E001 and E002, very simple and no stress on RR hierarchy!

Konstantin Dunaev Tue, 11/29/2011 - 00:16

Hi Matt,

you're right with a route-map next-hop setting, I need then some kind of tracking.

I've came also to the idea of  iBGP between RC-E001 and RC-E002  but couldn't find any "pro and contra" about iBGP session between route-reflector clients, it seems that not so many people have tried this.

maayre Tue, 11/29/2011 - 02:45

Due to iBGP split horizon it won't cause loops etc.

When you contrast a full mesh with a RR design, really it is about reducing the amount of sessions. You can add more sessions it shouldn't have a negative impact. The trap would be using the route-reflector-client in too many places but were not adding it in anywhere so were safe

Konstantin Dunaev Mon, 12/05/2011 - 01:39

Hi Folks!

thank you all again for the comments and inputs!

currently I've set an additionly iBGP session between 2 RR-clients and it seems to be working fine. At least the traffic doesn't flow the whole backbone before it goes out.

Again learned  something new

Best regards,

Konstantin

Actions

Login or Register to take actions

This Discussion

Posted November 23, 2011 at 5:47 AM
Stats:
Replies:48 Avg. Rating:3.2
Views:3573 Votes:0
Shares:0

Related Content

Discussions Leaderboard