I have an MPLS VPN network in Test LAB with some PE routers and a router with Route Reflector VPNv4 funcionality.
I notice two diffenent behaviours if the RR have or not have the default route in the global routing table.
Infact if the RR have the default route and I put down a PE router, the backup PE router work after 180 sec
If the RR haven't the default route and I put down a PE router, the backup PE router work after about 20 sec. It seems that the default route, for the recursive lookup role, keep the BGP next hop reachability UP until the BGP keepalive after 180 sec put down the BGP session. While if the default route is not present the RR switching to the backup PE as soon as the next hop is unreachable and before the BGP session goes down (with about 20 sec). I think RR might have this behavior even if a defaut route is present.
Does anyone had a similar problem? Is my analysys correctly or I'm wrong?
Your analysis is correct. The feature behind this behavior is called BGP Next-Hop Tracking which improve convergence time when we lost a neighbor. We react on RIB updates instead of waiting for the BGP scanner (every 60s).
The main drawback of this feature is when you have an aggregated route or a default one which is still there after you lost the /32 prefix. For NHT it means there is still a valid route to join the neighbor so there is no reason to converge. To avoid this situation, you need an improvement of NHT called selective NHT where you can tell NHT which routes are considered as valid:
In MPLS-VPN environment, we should check only for /32 prefixes:
router bgp xxx
bgp nexthop route-map CHECK
ip prefix-list FILTER seq 5 permit 0.0.0.0/0 le 31
route-map CHECK deny 10
match ip address prefix-list FILTER
route-map CHECK deny 20
match source-protocol bgp
route-map CHECK permit 100
with this configuration you consider as valid route to watch only non-bgp /32 prefixes