We are starting a conversion of a rather large network from atm/frame to mpls. We will be managing the ce routers and talk bgp to the pe routers. Our current network is eigrp. We will have quite a few backdoor links in the network. Some will be backup only and not carry normal traffic, others such as the backdoor links between our data centers will be the primary path between the sites.
My question is what is the best way to handle the backdoor links. We are looking at:
1)running bgp on the backdoor links also and ibgp between the routers for the backdoor and the ce router.
2)running eigrp on the backdoor but under a seperate eigrp as number and redistributing into the primary eigrp as.
Both have their pros and cons. I was wondering which way other organization have gone and why.
In my opinion, I will go BGP all the way, as it is easier with BGP to influence traffic selection and traffic paths. I am not sure how running EIGRP under a different AS helps the situation, but I try to run away from redistribution as much as possible, if I can.
Eigrp would need to run under a different AS then the primary so that the routes could be redistributed into the primary eigrp AS and enter with the same AD as the routes comming from BGP. If this is not done for routes of the same length the backdoor routes will always be prefered with the 90 AD. If both sets of routes are redistributed in you can use metrics to make one or the other prefered. Using metrics for route selection is not as clean as what can be done if the backdoor is also bgp, but the ibgp path between the mpls router and the backdoor router give another path for potential route loops that has to be dealt with.
I will go with EIGRP simply because i have had problems with 2 different routing protocols and tweaking prefixes to prefer one protocol to another. Also If you are doing MPLS-VPN with your provider, running EIGRP would make the entire EIGRP process transperant to him ie he will carry across the metrics etc so it will be a unique EIGRP domain for you and hence easier to control.
If eigrp is run over the back doors it will have to be run in a different AS then the primary eigrp as. The routes will be redistributed into the primary AS so that they will enter with the same AD as the routes that get redistributed from bgp. This makes this process a bit more complicated. If bgp is run on the back door it will still be transparent to the SP network. The advantage to bgp on the backdoor routers is that it is cleaner to select if the backdoor or the mpls routes will be prefered.
I have seen case studies with it done both ways but they did not get into enough detail. I am really looking for someone that has done a large rollout - this one is 400 sites in the first two phases - and what their experience has been.
my 2 cents on the subject.
I haven´t been involved with a customer in the situation you are. So those are some thoughts on the subject not backed up by experience.
First, you need mutual redistribution BGP<->EIGRP on all CE routers.
Second, as EIGRP will always prefer internal routes over external ones, you need another protocol on the backdoor links, which should be really backdoor.
This said I would first select the links, which really shall be backup to the MPLS network. All other (prefered) links should be running EIGRP with main AS to reduce complexity.
So lets first look at the "MPLS is backup" scenario. You will have the same networks on the CE learned through EIGRP and eBGP. The latter having AD=20 is prefered, which is undesired in this case. Setting eBGP to AD=150 could fix this. Additionally you need to tag the EIGRP networks learned from BGP with a site specific tag, which would allow to exclude them from redistribution back into BGP once they are announced through EIGRP to another CE.
Generally a tag should indicate that this network was already passed through the MPLS VPN and thus MUST not be redistributed again.
Now lets have a look at the "MPLS is primary" scenario. As you already stated you need another routing protocol/EIGRP AS in this case. On the CE this would still work, because external EIGRP with AD=170 is worse than (modified) AD=150 of BGP.
What remains is again to set proper filters to avoid routing loops most likely again with tags and route-maps for scalability.
With all this mutual redistribution it is clear, that any mistake in configuration or design of the filters will result in a routing loop.
The other option would be BGP everywhere. Be aware however, that this will most likely not remove the redistribution and filter complexity.
What I do not quite understand is, how the physical design looks like, i.e. where you have BGP routers and where EIGRP (main AS). In case you don´t want to black hole yourself, you need to redistribute back into EIGRP in any case, or run an iBGP full mesh on most of your internal routers.
So in the end you have a lot of complexity in both solutions. Both of them can be implemented. From an operation point of view I would say, that my tendency would be towards EIGRP instead of BGP. But just because your staff might know the latter good enough to operate the whole thing without too much pain.
Looking from a distance:
1) Have you pushed the SP hard enough (=$$ ?) to allow EIGRP on the PE-CE link? This would simplify the whole situation.
2) Have you thought of pushing the SP into OSPF on PE-CE and convert everything to OSPF internally? This would also simplify things. OSPF is better prepared to handle routing loops in MPLS VPNs and also sham links allow for having backdoor links, when required.
Hope this helps! PLease rate all posts.
Martin, thanks for a really good post. I know it is difficult to put the issues on paper when you can't see the physical design. To give an idea one key parts of the network are our two data centers. They are about 200 miles apart with a fat pipe between. The fat pipe will be a backdoor route, but will need to stay as the primary path for traffic between the datacenters. It will also be a backup path in the case of a MPLS pop going down in either city. So the routes from each area (we have many sites) need to go over the backdoor and be advertized to the MPLS network, but not as the primary path. Using BGP on the back door with an iBGP full mesh really helps here as it would pull in the routes and advertise them out with a longer path (we will use separate AS numbers at each site). If we use EIGRP on the backdoor we will have to filter the routes from the other site and use MED's of some type to the MPLS carrier so that they will not be preferred. A full mesh iBGP will also make choosing which path to take easier - using metrics in EIGRP for this always gets messy. But there are pitfalls. We have a number of static routes on the inside redistributed into eigrp. Due to this will have to tag in both directions if we do an iBGP full mesh on the PE and backdoor routers. As you may guess I am leaning this direction, but am not convinced yet.
You are right on the experience level of using EIGRP vs BGP. A full mesh iBGP will be more complex then just using BGP to the carrier. But we do have a large multizone DMZ today running BGP, so we are not completely green. We know that more training is going to be needed.
On your questions:
1) We have not pushed running EIGRP to the PE. When we bid this none of the carriers would support this and most advised us that this was a bad idea. EIGRP is very slowly dieing and I don't think that the carrier want to take it on. It also would bring into play the experience level on the carriers side if we forced the issue.
2)If I had my way we would have gone to OSPF long ago. But it is a large network - over 2k wan routers in more then 50 countries and that does not include switches that route. It has not been something easy to sell to management (if it is not broke, why spend money to fix it). To change to OSPF as part of this project would be tough as it would slow things down and that would be a hard sell.
Any pitfalls or complexities that you can see on running BGP on the CE and backdoor routers would be appreciated.
Wesley, I see your restrictions in this case now more clearly. Well running BGP between CE and backdoor routers requires a reasonable BGP design. The pitfalls are to avoid "black holes" and proper redistribution with tags. Also convergence time might become an issue - BGP was never meant to be an "IGP replacement". And of course IOS versions/feature sets do support BGP? ... memory upgrades? ... you seem to have a lot of fun these days ;-).
I assume you are aware of the "BGP black hole design" of having two iBGP speakers and an intermediate router only running EIGRP. This said you need to carefully look at every single site iBGP/eBGP design.
I do not know, whether you can define some sort of "standard". I highly recommend it though from an operation perspective. So bottom line: Use customer<->ISP design rules. Insert default route into locations if possible and use BGP network statements to avoid redistribution complexity, where possible. This said: better have a nice IP addressing design. I know, nothing which could be changed by now. Which routes should stay in BGP and which routes need to be redistributed back into EIGRP?
I would try to use BGP as intersite protocol and only locally use EIGRP. Each site should then have a tag to sort out redistribution if required at all. Main problem are locally created external routes which should be redistributed ("redistribute static in EIGRP").
Converting from EIGRP to BGP the AD of some routes might change from 90 to 170. Would this hurt? Also be careful to have a clean and tested migration path. You need to operate an "intermediate" state for quite some time. So be sure you can live with it.
Using eBGP as inter-site protocol will modify your convergence time. You surely want to look at all the BGP timers (advertisement, import, scan, hello) and lower them without getting unstable. The underlying question is: what convergence time is required by the applications? Doesn´t make too much sense f.e. if after IP routing convergence 2000 people need to login to a server again, because their sessions timed out. What means "backup" in such a case?
Hope this helps! Please rate all posts.
as you are now building your own private BGP based internet, be aware of BGP side effects, such as "BGP Wedgies" (http://www.ietf.org/rfc/rfc4264.txt) and "Persistent route oscillations in BGP" (http://www.ietf.org/rfc/rfc3345.txt). The RFCs describe the problems quite well. There are no solutions to some of them, they are there because of the way BGP works.
Make sure, that you are not designing these specific cases into your BGP setup.
Hope this helps! Please rate all posts.
Convergence time is a real issue for sure. One thing that will help is that the backdoors are almost exclusively optical or serial links. When they go down most of the time they go down hard and the router sees it. That way you don't have to wait on bgp timeouts. I am more concerned about convergence over the MPLS network. Feature sets will not be a problem, but we need to take a close look at memory.
I think I know what to look for on black holes - that little lab test we took gave us a pretty good training in that direction. The stardards we take very seriously and we work in global groups, so there is only one set of designs/rules that everyone has to follow. That is what we are working on now and hope to get right from the beginning. Our ip addressing I would describe as decent - I know it could be better, but no big problems.
BGP as the intersite protocol and then eigrp locally is where I am leaning. Routes that are AD 170 today in eigrp will cause us some grief. The migration we just have a rough start on - but all the players know this is critical. The problem is I am sure that some of the issues with migration and making the old and the new network play nice together will not show up till we are into the migration.
On the BGP timers I have at least a starting point as I have worked with our ISP's to lower the timers down there. All the ISP's were pretty much in agreement on how low you can take them and stay stable. I guess my question is can they be taken lower on an MPLS network. That is one question we will ask the carrier when we get into detailed design meetings. Any experience with this?
Thanks for your help
well lowering timers means increasing CPU load. Now for your environment with only lets say 5000 routes this can be pushed to some seconds (scan time import 5, eBGP advertisement 5 seconds, hello timer 2-3 sec). However the SP will have to decide, what is possible on the PE as he has to protect/plan ressources there.
What he really can do: lower eBGP advertisement intervall to 5 seconds (from 30), which doesn´t hurt too much and helps a lot.
All other timers might depend on other things he won´t give insight on. Now you are large enough a customer to be treated nicely ;-) So see what you get - "more bang for the buck" - is that what you would call it in the US?
Again, have a testbed with the smallest box you intend to run BGP on and the let the routes flap to check max CPU load.
My 2 cents.
Hope this helps! Please rate all posts.