I am in need of some big help. TAC can't help me because they do not get involved in design; they only do break/fix. I totally understand that.
I got a simple office: one flat LAN, one single 1841 router and 2 ISPs.
LAN is 10.10.20.0/24 and is connected to a port on an HWIC card I installed in the 1841. Then FA0/0 connects to ISP1 and FA0/1 connects to ISP2.
Everything is fine except that I am having some issues with the Failover feature. Currently, I am using Object Tracking with SLAs. I am pinging 2 hosts located on the internet and then I have an SLA OR statement which basically say if ANY of the 2 objects are unreachable, DO NOT trigger a failover to ISP2. If in the case that BOTH objects become unreachable, then DO trigger a failover. It works like a charm.
Any internet hiccup obviously makes the router activate the tracks and redirects all traffic to ISP2. However, 99% of the time ISP1 is back online within minutes or seconds, so after 180 seconds the traffic gets redirected back to ISP1. So in essence, the customer suffers 2 interruptions.
Besides internet hiccups, I have also noticed that every time any user tries to copy a big file accross the tunnel (the 1841 has site to site tunnels with 2 branches) the tracks go crazy and the objects become unreachable so a failover is triggered. We were breaking our heads and fighting with the ISP1 provider because every time this happened, we called them but every time they kept telling us that their line was UP and running without any problems. So after careful investigation, I do admit they were right.... it is not so much that the ISP1 experiences hiccups, it is actually the fact that users putting heavy load into the router are causing it to have its track to stop reaching the objects.
Have you guys seen this behavior? Can you please help me?
I did check that link Marwanshawi. Although it looks good, I still disagree with it.
In reality, small to medium size businesses have additional requests... for example, how about if you throw into the mix One-to-One NAT translations? more than likely your solution wont work for a "smooth" failover ad failback.
It is hard to provide smooth failover to customer networks with dual ISPs in a single router. The configuration has to be a bit more involved than what you presented. I will comment in your thread later so we can continue the discussion about that solution in its respective thread.
For now, the SLAs do have some fine tuning as I am finding out here. And although I am finding those additional features of the IP SLAs engines, still whenever there is a burst of traffic, the tracks will get confused and think that there is an outage. Again you can fine-tune it with IP SLAs but then the question becomes, what IF it is a real issue? Then you are in effect giving the customer extended downtime until the IP SLAs do go down (or timeout) and failover to the second line.
We are pleased to announce availability of Beta software for 16.6.3. 16.6.3 will be the second rebuild on the 16.6 release train targeted towards Catalyst 9500/9400/9300/3850/3650 switching platforms. We are looking for early feedback from custome...