IP SLA / Tracking issue on 1811 router for VPN access

Unanswered Question
Sep 15th, 2010
User Badges:

I have a Cisco 1811 router that has 2 ISP connections coming into it for VPN redundancy. I've noticed that over the last few weeks, the SLA which tracks the default gateway to the primary ISP has been failing, even though the SLA should be returning an OK status. The SLA seems to stay active for a few runs before stating there is No Connection, thus making it so the backup ISP is the preferred default route and the VPN endpoint comes up, leaving 2 QM_IDLE sessions on the router.


Here is a bit of the config:



crypto logging session

!

crypto isakmp policy 10

encr 3des

hash md5

authentication pre-share

group 5

crypto isakmp key MYKEY address 192.168.1.1

!

crypto ipsec security-association lifetime seconds 86400

!

crypto ipsec transform-set tunnelset esp-3des esp-md5-hmac

no crypto ipsec nat-transparency udp-encaps

!

crypto map VPNMAP 10 ipsec-isakmp

set peer 192.168.1.1

set transform-set tunnelset

match address 101

!

!

track 10 ip sla 10 reachability

delay down 10 up 30

interface FastEthernet1
description !!! Primary ISP !!!
ip address 10.1.1.2 255.255.255.0
ip nat outside
crypto map VPNMAP

!

interface FastEthernet2
switchport access vlan 899
!
interface Vlan899
description !!! Secondary ISP !!!
ip address 172.16.1.2 255.255.255.248
ip nat outside
crypto map VPNMAP
!
ip local policy route-map FAILOVER
!
ip sla 10
icmp-echo 192.168.1.1
!
ip sla schedule 10 life forever start-time now
!
access-list 199 permit icmp any host 192.168.1.1 echo
!
route-map FAILOVER permit 10
match ip address 199
set ip next-hop 10.1.1.1
!
ip route 0.0.0.0 0.0.0.0 10.1.1.1 track 10
ip route 0.0.0.0 0.0.0.0 172.16.1.1 250
!
And now the issue:
#show track 10
Track 10
  IP SLA 10 reachability
  Reachability is Down
    54 changes, last change 00:58:48
  Delay up 30 secs, down 10 secs
  Latest operation return code: No connection
  Tracked by:
    STATIC-IP-ROUTING 0
#show ip sla statistics
IPSLAs Latest Operation Statistics
IPSLA operation id: 10
Type of operation: icmp-echo
Latest RTT: NoConnection/Busy/Timeout
Latest operation start time: *14:43:48.581 EDT Wed Sep 15 2010
Latest operation return code: No connection
Number of successes: 0
Number of failures: 52
Operation time to live: Forever
#show ip route
Gateway of last resort is 172.16.1.1 to network 0.0.0.0
#ping 192.168.1.1 source fa1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.1, timeout is 2 seconds:
Packet sent with a source address of 10.1.1.2
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 60/69/80 ms
#show crypto isakmp sa
IPv4 Crypto ISAKMP SA
dst             src             state          conn-id status
192.168.1.1 172.16.1.2 QM_IDLE           2099 ACTIVE
10.1.1.2   192.168.1.1 QM_IDLE           2098 ACTIVE
#show route-map FAILOVER
route-map FAILOVER, permit, sequence 10
  Match clauses:
    ip address (access-lists): 199
  Set clauses:
    ip next-hop 10.1.1.1
  Policy routing matches: 110156 packets, 7051604 bytes
#show access-lists 199
Extended IP access list 199
    10 permit icmp any host 192.168.1.1 echo (110157 matches)
When I add a static route for the primary ISP with a distance of 10 (ip route 0.0.0.0 0.0.0.0 10.1.1.1 10) the IP SLA recovers and the secondary VPN conn-id (172.16.1.2) gets removed from the crypto.
Is there anything missing from the configuration that would cause this? Or are there any useful debugging commands to see why the SLA seems to fail, even though I can ping the remote host just fine while it says it's unreachable?
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Nagaraja Thanthry Wed, 09/15/2010 - 16:16
User Badges:
  • Cisco Employee,

Hello,


Instead of tracking connectivity to 192.168.1.1, can you track the connectivity to 10.1.1.1 (ISP next hop)? If the ICMP packet gets dropped along the path, then it could create a problem. Setting the tracking to the next hop ensures that as long as the ISP is up, the VPN tunnel through that interface stays up.


Regards,


NT

Jeffrey Warn Thu, 09/16/2010 - 06:45
User Badges:

Thanks for the response. I originally thought about doing that but the issue there is, that IP address will pretty  much always be up. The next hop is an ISP ethernet handoff, so connectivity to that IP address will be up 99% of the time (unless that ISP link dies completely).


Sorry I didn't illustrate that in my earlier post. Generally, it's the 1811 (10.1.1.2) -> (10.1.1.1) ISP gateway router -> INTERNET -> My ISP Gateway devices -> My ASA cluster (192.168.1.1)


I was curious more then not why it seemed the SLA pretty much just stopped working all together, which I think even if I was tracking the next hop, would have happened regardless.


Thanks

lapinmort Wed, 12/01/2010 - 10:59
User Badges:

I have a similar problem on a couple of C1812 with IOS 12.4. I have deployed them with dual ISP configurations and route tracking.

Their configuration configuration are similar to this:


! With tracking where ISP1's router is at 10.10.10.1, and ISP2's router is at 20.20.20.1, both have the default AD of 1

ip route 0.0.0.0 0.0.0.0 10.10.10.1 track 100

ip route 0.0.0.0 0.0.0.0 20.20.20.1 track 200


! With a weight (AD) of 200. Without these, IP SLA failed and the tracked routes never came up. Once these were in the routing table the tracked routes would come up, and replace them since they have a lower AD.

ip route 0.0.0.0 0.0.0.0 10.10.10.1 200

ip route 0.0.0.0 0.0.0.0 20.20.20.1 200


Interface Fa0 is setup with ip 10.0.0.2/24

Interface Fa1 is setup with ip 20.20.20.2/24


ip sla 100

type icmp echo to 4.2.2.2 source interface fa0

scheduled to run forever


ip sla 200

type icmp echo to 4.2.2.2 source interface fa1

scheduled to run forever


track 100 ip sla 100 reachability

delay up 5, down 5


track 200 ip sla 200 reachability

delay up 5, down 5


On one of the routers, that configuration works as expected:

Initially you have the following entries in the routing tables:

0.0.0.0/0 [200/0] 10.10.10.1

             * [200/0] 20.20.20.1


Then the IP SLA jobs 100 and 200 try pinging 4.2.2.2 using fa0, and fa1 respectively as source. Both jobs return success and bring up the tracking objects 100 and 200. That in turn, brings up up the tracked routes. Since the tracked default routes have a weight of 1 they replace the default routes with a weight of 200, and the routing table would show:

0.0.0.0/0 [1/0] 10.10.10.1

             * [1/0] 20.20.20.1


On the other router (different location, different ISPs, but I use the same IPs for the sake of simplicity) the configuration is the similar, but the behavior is different. The routing table would initially show:

0.0.0.0/0 [200/0] 10.10.10.1

            * [200/0] 20.20.20.1


The IP SLA 100 and 200 jobs never come up, because ping test to 4.2.2.2 with source fa0 or fa1 fail. But if you change the source to an internal VLAN, pings to 4.2.2.2 work just fine.


Note that there's no ZBF or access-list setup that could block traffic.


Now, let's say I shut down fa0. IP SLA for can suddenly ping 4.2.2.2 from fa1 using the route entry 0.0.0.0/0 20.20.20.1 200, and the tracked route 0.0.0.0/0 20.20.20.1 track 200 will come up and replace 0.0.0.0/0 20.20.20.1 200 in the routing table. If you run a no shut on fa0, nothing changes.

So the routing table looks like this:

0.0.0.0/0 [1/0] 20.20.20.1

I'm not sure what causes this behavior, but I guess it has something to do with the load-balancing algorithm.

The only way I can get fa0 and fa1 to load balance again, is by removing the tracked routes. Or perhaps by explicitly routing some traffic to the other interface using PBR.


Anyone knows why one router would behave differently from the other?


Thanks.


Rado

Jeffrey Warn Wed, 12/01/2010 - 11:14
User Badges:

I've been running this on some client 1811 routers with success. Here is a basic config that should work:



!! I use a delay on the up/down to prevent some flapping issues that could arise.


track 11 ip sla 11 reachability

default-state up

delay down 90 up 120

!! Primary ISP
interface FastEthernet0
ip address 192.168.1.2 255.255.255.0
!! Secondary ISP / Backup
interface FastEthernet1
ip address 172.16.1.2 255.255.255.0
!! Main default route is tracked and primary ISP is preferred
ip route 0.0.0.0 0.0.0.0 192.168.1.1 track 11

!! Backup ISP has a floating static metric of 250, will only be used if SLA fails
ip route 0.0.0.0 0.0.0.0 172.16.1.1 250
!! Note 10.1.1.1 is the endpoint my clients try to connect to (my side)
ip sla 11
icmp-echo 10.1.1.1 source-interface FastEthernet0
frequency 30
ip sla schedule 11 life forever start-time now
!! Access list to permit only icmp for SLA check
access-list 199 permit icmp any host 10.1.1.1 echo
!! Local routing policy to make sure SLA check always goes out primary ISP interface
route-map FAILOVER permit 10
match ip address 199
set ip next-hop 192.168.1.1
!! Ties route map to local routing policy
ip local policy route-map FAILOVER
I prefer to ping something I'm trying to reach remotely, but you could ping the next hop as well (just as long as you can be sure that would be the reason for an outage). I found that in most cases, a problem would exist after the next hop (ISP gateway) so the SLA would never work.
lapinmort Wed, 12/01/2010 - 12:02
User Badges:

That's an interesting configuration. However, it still won't allow me to get two load-balanced, tracked default routes.

Actions

This Discussion