We have an MPLS cloud connecting 9 differnet locations accross the United States.
We have an implemented failover solution using VPN connections to the Internet.
At each remote site there is one router configured for the MPLS and another router configured for the VPN. They are both configured with HSRP with a standby address. The clients at each site use the standby address as their default gateway.
At the main site, there are two 4500 series switches that have virtual tunnels configured for the Internet connections. These switches are also running RIP.
The MPLS cloud is running OSPF to all the remote sites, with the RIP route tables distributed into the OSPF.
The MPLS router is uplinked to the 4500's
The problem is that whenever there is a failure at one of the MPLS sites, the failover is not automatic. Whenever there is a failover to the VPN router, it has to be manually initiated. The tunnels on the switches have their interfaces shut down and the VPN routers have their interfaces shut down.
I have not seen what the problems are with this set up, I am told that there are false advertisements on the MPLS network from the MPLS notes when there is a failover.
There are access-lists everywhere to try and prevent this.
I am wondering if this situation is way over complicated.
I was wondering why we could not have both links up all the time just use a floating static route to and from the remote sites. The administrative distance on the VPN connection higher than that of the MPLS connection.
I am thinking also the problem is that there are two routers at each site and this is why there are false advertisements to the MPLS cloud.
Another problem of cours is for the switches to be notified the link is down, becase the actual MPLS router is one ethernet hop away.
Does anyone have any information on the best way to set up something like this?
Thanks for any information
If I had to guess I suspect you have a routing loop someplace where you are redistribution rip into ospf and OSPF is prefering this route when it should prefer its own.
You will have issues using static routes in that the thing that you point the static route must be down for the floating route to take over. If you point to the connection between your and your MPLS provider you have a issue like the HSRP.
The HSRP issue is solved in the newer IOS releases
You can test many things to make your HSRP selection but normally you just ping something to see if its up.
You can also do this with routing but unlike the HSRP I have not actually done this. This is a interesting
link on how to do this.
This should work using the method you have its just complex running multiple routing protocols.
Thanks for the input tdrais,
Can you tell me what you mean as far as:
Is the HSRP described here ok, or is it too complicated?
Also, are you saying the the problem is that the MPLS router is still up and this is what is causing the problem?
Are you talking about a loop because of the multiple protocols? It seems that even with the VPN, OSPF would still see the route via the VPN connection as with RIP also?
It seems with the object tracking that the floating static route would work right, or are you saying with the object tracking it should work the way it is?
I will try and post the configs later
your setup with HSRP sounds like it is well thought through, maybe there is something missing from your configuration. Can you post the configs of both the routers at one of the MPLS sites ?
Maybe I misunderstood your HSRP issue.
The normal problem with HSRP is that is hard to get it to fail over other than if the device that is running HSRP itself fails. They added the track option so I could for example watch a serial line and if that failed I could allow the other router to take over. This does not work well when the serial line stays up but no traffic is passing over it. This new option allows me to track end to end reachability.
In your case I assumed you could still see the MPLS provider you just lost the routes. All your traffic would go to this router since it is still hot and then get dropped since it has no route out the MPLS network.
If you run OSPF over both connections then you should have no issues. I read it as you were running rip ov er the VPN and OSPF over the MPLS. As long as the meteric are set correctly OSPF should always use the correct path. Got side tracked by the rip redistribution idea.
I have not seen an attempted failover of this setup.
But they are having issues with the way it is set up now. At the moment,the failover is manually established when there is an MPLS connection failure to that site and reversed when the MPLS link comes back up.
I heard someone say that they thought the failover initiated ok, but the failback did not work.
The switch at the main site is running only RIP,
the MPLS router at the main site is running RIP and OSPF,
the MPLS router at the remote site has only OSPF,
the VPN router at the remote site is running RIP and OSPF.
The switch connects to the MPLS router, which is connecting the main site to the MPLS cloud (one ethernet hop away, so the switch never actually has an interface go down).
From the notes I have seen ("access-lists need to be added at each MPLS router node to prevent false routes from poluating the route tables"),
I am thinking that you were onto something with your first remark about route table conflicts.
Also it seems to me that this thing has been way over complicated by having the overlapping route protocols. While I know this (RIP distributed into OSPF)is a viable solution to some problems, I think it may create some problems also.
I also like your suggestion about the object tracking.
What are your thoughts about the HSRP, has it been implemented correctly in this sitation (if you can follow any of what I have put down here)?
An what about taking the RIP out of the picture and using static routes with the tracking?
Also, what were you saying about the metrics, are you talking about the redistribution metrics?
Is HSRP designed for a situation like this?
You solution looks simple until you take a much deeper look into it. There are a number of grey areas with the implementation you need to be aware of.
1. As mentioned previously, normal HSRP will track the interface. With the switch at the HQ connected to the main router, the HSRP failover will not work. If your WAN interfaces at the branches are ethernet also, I will expect the same problem. However, the path for the traffic will be determined by your routing configuration. The HSRP in this situation then, should not be used for link failover, rather for router failover. Your routing configuration should provide the functionality for the link failover.
2. The fact that you are redistributing RIP into OSPF could lead to routing loops in the MPLS VPN domain. OSPF external routes in MPLS VPN do not have their down bit set (the down bit is used for routing loop prevention). OSPF external routes often requires manual route tagging and filtering.
3. I will not quickly advice removing the RIP and having OSPF all the way. For this, your provider will have to configure sham links for the config to work properly. Confirm from your provider that he is willing and able to configure sham links for you.
Can you post your configs, it will help in identifying the actual problems.
My follow up questions for olorunloba:
Are you saying that HSRP will not work the way it is expected to work here?
What would be a viable way to get this type of scenario to work?
How would I direct the traffic as you have stated with the routing configuration to failover as we want?
Here is one remote site VPN and MPLS router,
along with local site MPLS router and switch.
I have edited the configs some, the switch is only showing on tunnel for the one site, there are actually five differnet sites.
Note also that there is another switch with an ether channel configured between it, and the switch shown, which explains why there are two tunnels on the VPN router config and only one showing on the switch.
If you are trying to use HSRP for link failover, then it will not work, as it will not primarily designed for that. HSRP is designed for router failover, so that if your primary gateway is down, another router can take over the function, without causing downtime. Allow me to explain further.
For your 2 routers at the branches, the MPLS router has a higher priority and hence the primary gateway. PCs on the LAN will use this router as their primary gateway, their gateway is configured with the virtual standby ip address. When they try to communicate with the HQ, their packets will reach the MPLS router. From there, the routing table is consulted on how to forward the packet. If the routing table dictates that it should go out through the VPN router, the packet will be forwarded to the VPN router. In an attempt to use HSRP for link failover, some people try not to enable any routing protocol between the 2 HSRP routers, so that forwarding will always be through the local link. The downside to this is that, this method has no way on influencing the return path, and often times communication is broken.
Also, in your config, you have not configured any form of tracking for the HSRP. This implies that active gateway will change when it dies (maybe it is powered off).
To effectively address a link failover situation, it should be addresses as a routing problem.
With respect to your network not switching back to the MPLS, lets look at the scenario. Lets HQ be connected to PE1 and the remote be conneted to PE2. Also, lets assume that the MPLS link has failed and you are using the VPN connection at the moment. HQ will learn about the remote via RIP, across the tunnels. This is redistributed into OSPF and advertised into the MPLS core. At this time, PE1 and PE2 both believe that to reach the remote is through the HQ.
Lets assume then that the MPLS link comes back up. PE2 will get an update from remote. This will be entered into its routing table, because the OSPF learnt route (110) has a better AD than the iBGP learnt route (200). However, HQ is still advertising the remote route to PE1. PE1 will continue to believe that this route is better, because he learns it via OSPF as against iBGP from PE2.
Hence you have a problem where PE2 rightly beleives the route is reached from the remote, but PE1 worngly believes is via HQ. The situation will not be remedied until either the tunnel between HQ and remote is shut down.
There are other gray areas in your config. The fact that you are doing mutual redistribution in 2 places can also lead to routing loops. I do not understand the reason behind the offset lists and the distribute lists. Also, the reason for changing the AD of the RIP routes.
Well my advice, try and first simplify the config, makes it easier to work with. If you can do without them, get out the offset list, distribute list, etc. Do not have a mutual redistribution between OSPF and RIP. Instead, inject a default into both domains. This way, if an MPLS link is down, HQ will still learns the route via RIP, but is not redistributed and advertised into the MPLS domain. However, because there is a default route in the MPLS cloud towards the HQ, other branches will still be able to reach the affected site.
Another way to go about it is to use one routing protocol, maybe EIGRP, or BGP and configure SOO. I do not link OSPF too much because of the need to configure sham links. However, this option will require your Service Provider input, so you should discus with him on which is best.
Sorry for the long post, I hope it is useful
Thank you for the long post, that is exactly what I need. I like you also thought this was overcomplicated and need someone of your expertise to look at it.
I like you, thought the HSRP was not what they needed and that it was a routing problem. However, I do not have the knowledge you have on these issues.
As far as taking the redistribution out, I like that idea, I also did not like the idea that they were using two routers at each site and thought it would be easier to implement with one router and using the other as a hardware spare.
Scenario 1: I like this idea, but I am a little unclear on how I would inject a default into both domains. Can you explain that?
Scenario 2: I would be open to changing the routing protocol, but my concern is that these sites are very critical, I do not want to take them down for any reason and am concerned with down time. Is there a way to do this without loosing connection?
Thank you so much for the information.
To inject default route into the OSPF, use the default-information originate always command. Check the following link
To inject default route into RIP use the following command ip default-network. Check the following link
You could also configure a default static route (you can point it to the null interface, if it will not affect your services, such as internet), and redistribute the static route into RIP and OSPF
ip route 0.0.0.0 0.0.0.0 null 0
router ospf 1
red static subnets
I cannot guarantee that there will be no downtime, more especially as I do not have full knowledge of your network. Migrations like these should be properly planned.
My advice: inject the default route, and see that the remotes have learnt the route. Then remove the redistribution, and check for connectivity. Then unshut your tunnel interfaces and test to see if the design is ok. Let us know if there are further issues
When injecting the route as in the first two examples for RIP and OSPF, do I just do this at one site and it gets replicated to the other sites?
And sort of the same question, where do I put the null interface route to get it to replicate properly?
Thanks in advance for all of the great help
Yes, you only need to do it one site, and that is the HQ. With respect to which router at the HQ, I will choose a central router and one running both OSPF and RIP.
The logic is that the remote sites can use the default route to reach another remote site who primary MPLS link is down. The traffic will be directed to the site, via the VPN tunnel at the HQ.
Did you notice the ethernet link between the two routers at the remote sites on the 30 bit subnet mask?
I am thinking also this is not needed, it is also in the OSPF route tables.
What are your thoughts?