I deployed Cisc Optimized Edge Routing (OER) on a router connecting to two different ISPs using a Switch. One DSL ISP and one VSat ISP. The DSL Modem is connected to the Switch and configured to be in VLAN 168, while the VSat Modem is connected to thesame Switch and configured to be in VLAN 10. The F0/0 of the Router on which Cisco OER is configured is connected to thesame Switch and configured as Trunk connection, 2 Sub-interfaces are created and trunking mode configured as IEEE 802.1Q. The port on the Switch where the Router is connected is also configured as a Trunk with IEEE 802.1Q. The F0/1 of the Router is then connected to a Switch in my LAN.
Additional Features configured on the Router include, NAT, IOS Firewall on the 2 Sub-interfaces connected to the Switch, and then ACL applied to the 2 Sub-interfaces as well.
I then configured OER on the Router, it is both Master and Border router, the 2 Sub-interfaces (created from Router F0/0 physical interface) are configured as OER External interfaces while the F0/1 Physical interfaces is configured as OER Internal interface. To the best of my knowledge everything is configured well but when the primary Link, which is the DSL, is disconnected by removing the cable connecting the DSL Modem and the Switch, Cisco OER does not failover successfully to the VSat ISP as expected. Sometimes I see OER Static Routes in the Routing Table but traffic is not sent successfully through the VSat Link. Can anyone take a look at the attached configuration and advice as soon as possible.
The problem might be due to the usage of "local Loopback255" within "oer border".
If your doing active probing, the generated OER SLA packet sources from the loopback, which is still reachable via the other path. Since the interface doesn't drop, you want the monitoring to fail. Removing the "local Loopback255", the OER SLA packet will source from the closet interface for the route its probing. (This also requires that the interface addresses are routable.)
So when you took a look at the Configuration I attached, the only problem you discovered is the "local Loopback255" within "oer border". And you advice I remove the option right?
Tried that and OER completely stopped working, when both Master and Border router configurations are pointed to the loopback interface ip address.
The problem is that after configuring OER, at specific times I see static entries created by OER in my routing table pointing to both ISP nexthop IP Addresses. And then when I look at the Nat Translation Table, I also see translations created for both ISPs, but when you disconnect the Primary ISP, client connections are lost instead traffic being transported successfully through the other ISP.
Can you please take a closer look at the configuration I attached, do you find anything missing?
I apologize about the "local Loopback255" under "oer border". What you had is correct. What I had in mind was using a loopback with this statement "active-probe address source interface Loopback255", under "oer border".
My experience with OER (12.4) has been using it with BGP, not with pure static routes and NAT, as you're trying. That noted, a couple of thoughts.
I found with the documentation under "Static Routing and Static Route Redistribution into an IGP", "static routes to border router exit interfaces must be configured", and an example like "ip route 0.0.0.0 0.0.0.0 Ethernet 0", so I wonder whether you need to change your route statements from the next hop, e.g. "ip route 0.0.0.0 0.0.0.0 192.168.254.254" to "ip route 0.0.0.0 0.0.0.0 FastEthernet0/0.168".
Since your doing only active monitoring ("mode monitor active"), does "sh oe b ac" show successful hits on external hosts?
using a command like "ip route 0.0.0.0 0.0.0.0 ethernet0" is not recommended in today network deployments, becuase if you use outgoing interfaces of Multi-access Network types such as ethernet, u will need to perform ARP for every route that matches your default route pointing to the Multi-access Interface such as ethernet0. this means that if you use that command on your router, and then you take a look at the routers ARP Cache using the command "show arp" you find so many ARP Entries. This should not have been used in the Cisco Documentation you even saw such a command. I saw the command myself too. If you use this command on some IOS versions, you are warned immediately and then the command rejected.
Moreso to justify all commitments i used that command as you suggested and OER never worked as expected still. I suspected that probably OER has some interaction problems with IOS Firewall CBAC, but after I turned off IOS Firewall CBAC, OER never worked too.
I must say thank you so much for all your commitment. I am not sure of this behaviour being a bug however. I probably have to open a TAC Case. We probably have to research more.
Agree about your comment on the route statement - but thought OER might somehow need to see a easy tie in to the interface (although one would think it should be able to resolve the interface via the next hop address being on a subnet used by an interface).
Almost a silly question, but how much time are you allowing for OER to reroute? It usually doesn't respond as quickly as most routing protocols would.
Not clear to me, did you confirm the active probes are getting hits?
One last thing that might speed up detection of a failed path is to also use passive flow detection, i.e. mode monitor both.
Beyond that, I also agree, TAC might be your next best step, especially since you're using many of the latest OER features.
If possible, please post final results, i.e. what was missing, or it's a bug.
I am currently using both passive and active monitor mode (mode monitor both). there were active probe hits. Unfortunately the devices do not have any current service contract with cisco so i could not even open the TAC case, and am just stuck with this OER stuff.
Did you say the failure detection usually take sometime? this is my first deployment so i don't know how much time it takes for this. from your experience how long does it take? i believe though that the failure detection may depend absolutely on your policies however. i gave delay highest priority and then set a delay threshold value around 1200 milliseconds.
The funny thing is sometimes i see static routes in the routing table created by OER and pointing toward my second ISP, and even see NAT translations based on my second ISP in the NAT table. But when I disconnect the primary ISP connection and try to browse, page cannot be displayed. I usually disconnect my Primary ISP and perform my test immediately. i do not wait for a while before testing.
"I am currently using both passive and active monitor mode (mode monitor both)." Just confirming, this is different from your original post's attached config file that has "mode monitor active"?
"Did you say the failure detection usually take sometime?" I think that's correct for loss of logical connectivity through a path. I'm unsure whether it's purely on a fixed cycle or whether it can respond faster when it has more information (i.e. both passive and active). Also, unsure how much faster the 12.4T version might be over the 12.4 version. I see one of the latest features in 12.4T is:
In Cisco IOS Release 12.4(15)T, a new monitoring mode, fast monitoring, was introduced. Fast monitoring sets the active probes to continuously monitor all the exits (probe-all), and passive monitoring is enabled too. Fast failover monitoring can be used with all types of active probes: ICMP echo, Jitter, TCP connection, and UDP echo. When the mode monitor fast command is enabled, the probe frequency can be set to a lower frequency than for other monitoring modes, to allow a faster failover ability. Under fast monitoring with a lower probe frequency, route changes can be performed within 3 seconds of an out-of-policy situation. When an exit becomes OOP under fast monitoring, the select best exit is operational and the routes from the OOP exit are moved to the best in-policy exit. Fast monitoring is a very aggressive mode that incurs a lot of overhead with the continuous probing. We recommend that you use fast monitoring only for performance sensitive traffic. For example, a voice call is very sensitive to any performance problems or congested links, but the ability to detect and reroute the call within a few seconds can demonstrate the value of using fast monitoring mode.
The above feature, "route changes can be performed within 3 seconds", implies normal OOP route changes will take longer (minutes?). If you can, try waiting up to five minutes or try the above feature.
"The funny thing is sometimes i see static routes in the routing table created by OER and pointing toward my second ISP". My guess with this is since you also enabled throughput and delay learn modes, OER is choosing a better path for some of your traffic. (This may be positive indication that OER is working.)
Another silly question that just came to mind, what about the return traffic flow. If you lose a link logically, will the far side know to avoid that path?
I see you're using absolute thresholds. I've used default relatives. When you use "loss threshold 10" do you think it means flip on just 10 packets, as the examples indicate? Or, flip when the lost percentage exceeds 10/1000000, as the text description seems to imply?
Here's a white paper that also covers using OER for link failure (12.4 syntax). http://www.cisco.com/en/US/products/ps6599/products_white_paper0900aecd802b8e68.shtml
I got the Failover between my two ISPs functional at last. I could not get it working with Cisco OER. A colleague came up with the idea of using Cisco IOS Policy Based Routing with the Multiple Tracking Options Feature (using Cisco SLA), so I tried and it worked perfectly well. I was doubting the compatibility of Policy Based Routing with the Multiple Tracking Options Feature with NAT but everything works fine. The following is the link to a document of Cisco Website that presents an example of this feature. http://www.cisco.com/en/US/tech/tk364/technologies_configuration_example09186a0080211f5c.shtml
Below is my final configuration for this feature on my router.
ip sla 11
icmp-echo 126.96.36.199 source-interface loopback255
ip sla schedule 11 life forever start-time now
ip sla 22
icmp-echo 188.8.131.52 source-interface loopback255
ip sla schedule 22 life forever start-time now
track 11 rtr 11
track 22 rtr 22
ip access-list extended SLA11
permit icmp any host 184.108.40.206
ip access-list extended SLA22
permit icmp any host 220.127.116.11
route-map SLATRACK_1122 permit 11
match ip address SLA11
set ip next-hop 192.168.254.254
route-map SLATRACK_1122 permit 22
match ip address SLA22
set ip next-hop 18.104.22.168
ip local policy route-map SLATRACK_1122
ip access-list standard DSL_NEXT_HOP
ip access-list standard VSAT_NEXT_HOP
route-map DSL_NEXT_HOP permit 1000
match ip next-hop DSL_NEXT_HOP
route-map VSAT_NEXT_HOP permit 1000
match ip next-hop VSAT_NEXT_HOP
ip nat inside source route-map DSL_NEXT_HOP interface FastEthernet0/0.168 overload
ip nat inside source route-map VSAT_NEXT_HOP interface FastEthernet0/0.10 overload
route-map TRAFFIC_CONTROL permit 1000
set ip next-hop verify-availability 192.168.254.254 11 track 11
set ip next-hop verify-availability 22.214.171.124 22 track 22
ip policy route-map TRAFFIC_CONTROL