Static route withdrawal time following the Nexthop failure detection

Unanswered Question
Mar 14th, 2009

Hi,

I have a router with a static route and a floating static route for a destination subnet via two different physical interfaces.

The requirement is to backup the primary link using a backup path.

But during the testing i found that it takes around 25 seconds to withdraw the static route from the rotuing table. I tested the failure senario by shutdowning the primary path interface in this router.

Even after the interface goes down, it takes around 25-30 seconds to widraw this static route. But as soon as this static route withdrawed from the table, floating static take over the traffic as expected.

I need to tune the convergence time to 3 sec. Could anyone help on this please?

Jayanthan

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
jayanthan Sat, 03/14/2009 - 15:22

Hi Andrew,

Thanks for the reply. I need the most simple design which fulfil my requirements. But if not possible with static routing i need to go for a dynamic routing protocol.

Please advice whether the static route withdrawal time I mentioned (25-30 secs) is the normal time it takes or can i reduce it by any tuning mechanisms.

And if i am going to apply IP SLA in this case, can i reduce the time it takes to withdraw the static route from the table. I understood IP SLAs responder increases measurement accuracy, which is not possible with ping or other dedicated probe testing. How can i use the IP SLA to tune the convergence time.

Thanks & Rgds,

Jayanthan

To answer your questions:-

"Please advice whether the static route withdrawal time I mentioned (25-30 secs) is the normal time it takes or can i reduce it by any tuning mechanisms" - this depends on your current setup. Do you current static routes point to a next hop IP address or a physical interface. If a next hop - what is the medium, Ethernet, Frame-Relay, PPP, ISDN, MPLS etc - as the time it takes the device to realise the next hop is no longer available is key to your issue.

"And if i am going to apply IP SLA in this case, can i reduce the time it takes to withdraw the static route from the table. I understood IP SLAs responder increases measurement accuracy, which is not possible with ping or other dedicated probe testing. How can i use the IP SLA to tune the convergence time." There are many ways to use IP SLA, at layer2/3=ICMP, Layer3=IP and layer4/TCP~UDP.

You talk of cenvergence time - with static routes you have no convegence time, convergence is the process of the network learning about itself and all devices - in your current setup you do not have this.

jayanthan Sat, 03/14/2009 - 16:08

Hi Andrew,

Thanks for the prompt reply.

If I explain the setup, two C7304 routers are directly connected using a OC-12 POS link (OC-12 interface fibre connector). HDLC is used as the L2 protocol.

These two routers have the IP connectivity through another path also where GigE (Fibre connector) interfaces are used in both routers.

I need to use the OC-12 POS link as the primary path and the other as the secondary path. So i have deployed a static route (with both nexthop ip and output interface) over the primary link (OC-12 link) and a floating static route (with both nexthop ip and output interface) over the GigE IP path.

I simulate the failure by shutdowning the POS interface in one of the router. I used a 1 sec timeout ping to monitor the time it takes to recover from the failure and forward the traffic via the secondary IP path.

Once I shutdown the interface the router pops the log that the 'interface status went down'. From this point the device should realise that the nexthop is not available isn't it.

Yes, i think i sould use the 'resilience' than convergence in this case.

Thanks and Regards,

Jayanthan

Jayanthan,

OK - I see a simple solution to your network:-

Run HSRP on both routers on the interface facing the LAN. Both routers have a static route in their table to the remote end. On the "primary router" the HSRP configuration is "Tracking" the OC-12 interface, if the interface goes down, the "secondary" router see's the "primary" reduce HSRP priority and takes over - routing over the Gige connection.

The LAN have a default gateway of the HSRP IP address - the LAN never know the failover has occured. you can "tweak" HSRP timeouts to the lowest 1 second hello, 3 second dead. You will have failover in 3 seconds - I have deployed this many times and can say in most cases failover happens sooner than 3 seconds.

Attached - a simple diag of how I see your network and HSRP running. HSRP is a really simple solution, however it will not detect a break in the OC-12 link if your provider has an issues and the interfaces at both ends are up/up but no traffic can be sent/recevied. In this senario a dynamic routing protocol would be best.

HTH>

Attachment: 
jayanthan Sun, 03/15/2009 - 02:16

Hi,

Really appriciate your design, but please see the attached diagram to view the requirement. Sorry for not providing it in the start.

The availability required is 99.9999. The only router as shown in the diagram is used to handoff the traffic to clients LAN.

No HSRP is the condition of the client as there is a seperate same set of hardware to provide the hardware redundancy (including the clients server). Clients server have the ability to do the Active/Standy switch over during any failure which longer than 4 sec.

So i need to provide a 3 sec resilience time between these two OC-12 links.

Thanks & Regards,

Jayanthan

Actions

This Discussion