Load share traffic across two different BGP carriers

CSCO10203269 · ‎03-01-2014

Diagram

Our WAN infrastructure is comprised of two MPLS carriers both of which are running BGP. For redundancy, all of our locations have dual MPLS connections to WAN. All corporate data and Internet access is housed in our data center. A number of our applications are sensitive to asymmetric routing, as a result, one of the carriers has been setup as primary; accomplished via path prepend advertisements and setting a local preference at the data center.

We now want to split traffic across both carriers (corporate traffic on carrier-1 and Internet traffic on carrier-2) but we still want to avoid asymmetric routing. To accomplish this we can advertise a longer default 0.0.0.0 route out of Carrier-1. Remote site traffic will still prefer Carrier-1 for 10.1.x.x traffic but will select Carrier-2 for Internet traffic. In the event of a carrier outage traffic will still dynamically route across the other carrier.

The question is, how do we ensure outbound/return Internet traffic from the data center to the remote site will select Carrier-2? The only thing I can come up with is to NAT outbound Internet remote site traffic at the Carrier-2 router. I was also thinking of using BGP community strings applied to QOS taggings but wasn't sure the community parameters are meant for that use. Any other ideas?

Jon Marshall · ‎03-01-2014

Basically you either need to -

1) use NAT as you say for internet traffic. However depending on where you do the NAT you may also need additional config ie. -

a) you NAT at the remote site but then you still need to be able to distinguish the return traffic so you would need to use an IP or pool at each remote site and then have specific routes in the DC pointing to the Carrier-2 router

or

b) NAT overload on the Carrier-2 router to the inside interface of tha router so traffic is automatically sent back to the right router.

or

c) use a NAT pool on Carrier-2 for all the remote sites

the issues with b), and this can also affect c) is that IOS NAT does not NAT overload from outside to inside but it does from inside to outside. So if there is no NAT being done on that router you could make the inside interface the one connecting to the MPLS network and the outside interface the one connecting to the internal DC.

The advantage of b) and c) if you can do them is you only need configuration on the Carrier-2 router, no need to configure other routers or add routes within the DC.

The advantage of a) is you are not going to possibly run out of NAT translations and you spread the load beween all routers.

The other possiblity is to use PBR although it's not clear what network devices you have within the DC eg L3 switches etc.

With PBR you could in your acl deny any return traffic from internal DC subnets so it is routed normally and then permit everything else ie. internet traffic and send this to the Carrier-2 router.

The advantage of this is no NAT which can break certain things. Again it also means just configuration within the DC and nothing to do in the remote sites.

The disdvantages are PBR requires a certain feature set on certain switches and using deny statements in your acl can have adverse effects on the CPU. There are ways round it but you are in effect doing PBR on all traffic **

** If your internet connection is via a dedicated vlan eg. your firewalls are connected to your L3 switches on their own vlan then actually your acl becomes very simple because you do not have to account for non internet traffic eg. you would simply apply the PBR to the core switch SVIs that connect to your firewalls so the only actual traffic coming from the firewalls would be internet traffic and you simply match all traffic in your acl and set the next hop to be the Carrier-2 router.

Obviously i have made certain assumptions about your DC topology so if it is different or i haven't explained things very well then please come back with any further queries.

Jon

CSCO10203269 · ‎03-01-2014

If we were to NAT it would definitely be a pool so we'd have a way of determining the real source address if we had to. My thought was to select an unused remote site subnet as the NAT pool then advertise the /24 subnet out of Carrier-2; we're currently advertising /16 subnet to both carriers. PBR came to mind but I share the same concerns regarding CPU utilization on the L3 switches.

Jon, I appreciate the input.