I have a production VoIP network with a 100mb Ethernet WAN link(EoS) between a major site and a datacenter where CallManger and all servers are located.
I installed a point to point T1 to carry voice traffic to the datacenter in the event the primary WAN link goes down. I've sucessfully implemented EIGRP to advertise the backup routes and the failover works incredibly well...
I understand that EIGRP sends Hellos every 5 seconds on an Ethernet interface and the hold-time is 3x for 15 seconds. I was expecting ~15s of downtime before the backup routes were injected into the routing table.
During my intial testing of the failover, calls didn't even drop and from a user's perspective nothing happened...
Can someone help me understand how the router can failover immediately given the hello timeout?
Also - with regard to the Ethernet WAN link. During an outage, the interfaces do not show "down", because they are plugged into the MUX(for lack of better term), which encapsulates the ethernet into SONET...what effects, if any, does the interface up/down status have on the routing protocol? Thanks.
I am not sure if I know the answers to the first part of your question. But the last part has a pretty simple answer. The interface up/down status has a major impact on the convergence and failover of routing protocols like EIGRP. When you look at failover and convergence there are really two components: how long does it take to recognize that there is a problem, and then how long does it take to react to the problem. If an interface changes state from protocol up to protocol down then the routing protocol recognizes immediately that there is a problem and starts to react to it. There is no waiting for hello to fail if the interface state changes to down.
As to the first part of your question about how EIGRP was able to failover so quickly without waiting the 15 seconds for hello to fail, can you tell us what kind of error you produced and how you produced it? This might help us understand how EIGRP was able to react so quickly.
It used to happen in earlier codes that router waits for hold down time to fail and install the backup route in routing table but in current codes when there is a topology change EIGRP immediately sends a "goodbye" message which improves the convergence time and immediately backup routes comes into picture.
The goodbye message is a feature designed to improve EIGRP network convergence. The goodbye message is broadcast when an EIGRP routing process is shutdown to inform adjacent peers about the impending topology change. This feature allows supporting EIGRP peers to synchronize and recalculate neighbor relationships more efficiently than would occur if the peers discovered the topology change after the hold timer expired.
Goodbye message is supported in following IOS codes
12.3(2), 12.3(3)B, and 12.3(2)T and later releases
Can you please confirm if you have any of these codes running on your router?
Hi Rick & Ankur,
I am running 12.3(11)T. I simulated the WAN failure by physically unplugging the Ethernet interface at the site. This caused the interface to go down which may have caused a goodbye message over the remaining T1 link allowing instant convergence?
During actual outages I've seen, the interfaces themselves stay up...so perhaps this isn't a realistic testing method.
I want to do as much testing as possible and would appreciate any further feedback.
The reason you're getting such fast failover is because you're pulling the cable. Instead of doing this, try removing the ip address off the interface. I don't know if this will trigger an EIGRP goodbye or not. What you want is a way to not trigger an EIGRP goodbye while leaving the link up, which might be hard in and of itself.
Hi Russ - Understood. How about an access-list applied to the interface?
I can only assume with no ip address this would remove the interface from eigrp advertisement due to network x.x.x.x in router eigrp.
Any other thoughts?
This access list should work to take the neighbor down and leave the interface up:
access-list 100 deny eigrp any any
access-list 100 deny ip any host 220.127.116.11
access-list 100 deny ip host 18.104.22.168 any
access-list 100 permit ip any any
It's actually a bit of overkill, including both the multicast and all EIGRP traffic, but... Better safe than sorry, I suppose.
Hey Russ - That looks like a very good way to test while not impacting voice/data. I'd like to see what happens to active voice calls during a failover so I'll probably just use deny ip any any...thanks a lot for your help. I'll post back my results, will test soon.
If it doesn't work smoothly, I may specify eigrp hello or hold-time to 1s. Instant failover seems like a good trade off to increased eigrp traffic on the high speed link.
The access list to block incoming traffic worked perfectly. Active calls dropped when it was applied. I added this to the WAN interface configs:
ip hello-interval eigrp 492 1
I tested again and this time active calls stayed up. Since the link is 100mb, I think I'm going to leave the config. Thanks for the help.
You're probably taking the interface down, rather than the protocol.... This would cause EIGRP to react within a second, if not less, depending on other configuration factors. If the interface goes down, the connected route is pulled from the routing table, which causes EIGRP to be notified, which then causes EIGRP to drop the neighbors on that link.
If the link doesn't fail, then it will take dead time + the convergence time to flip over to the other link.
EIGRP has always been able to do subsecond convergence on a well designed network--OSPF and IS-IS are just starting to catch up!!! :-)