I am experiencing a strange problem with GRE tunnels we are using to connect a remote site. There are two routers, RT1 & RT2, using HSRP & each router has its own GRE tunnel connected over the internet to a router in our network hub. I am running EIGRP over the tunnels. The problem occurs when the tunnels go down due to an internet outage, when it comes back up everything is ok except i cannot ping an NMS server at our hub, i can ping devices in the same subnet as the server but not the server. I then need to shut down the inside interfaces to switch HSRP over to router 2, which can ping the server. If the tunnels go down again then router 2 cannot ping the server as with router 1. I have noticed that after roughly 4 hours the routers are able to ping the server again. I have checked the routing tables after each outage & all the correct routes are there. The only traffic going over the active tunnel when it goes down is to & from the server so i am not sure if this has anything to do with it.
I am puzzled as to why this is happening, has anyone out there seen this issue before?
1) - what is the delay/bm metrics for the tunnels used?
2) - try reducing ther EIGRP hello and dead timers on the tunnels to say 1 hello 3 dead
3) - which is the primary and which is the secondary?
4) - you may have an EIGRP mis-match and asymentric route.
5) - have you tried using tunnel keepalives?
Firstly, thanks for the quick reply.
1, I set the delay on the tunnel interface connecting to RT2 at our hub to 500050 so the tunnel to RT1 is prefered at 500000, i think thats what you are asking?
2, I'll look at that & give it a try but to be honest EIGRP seems to be working well with fast convergence when the tunnels come back up.
3, Rt1 is primary & RT2 secondary.
4, I have looked at the routing tables & all looks ok.
5, The tunnels are using keepalives.
OK - have you made sure the delay is the SAME at both ends of the tunnel?
Can you supply the output from both devices:-
show ip eigrp int
show ip eigrp nei
show ip eigrp top
show ip route eigrp
The delay was only changed at the hub side, RT2 is still using the default.
I have attached the output you asked for so a short explanation of what is what is required:
Tunnel 2 to RT1 172.20.20.13
Tunnel 3 to RT2 172.20.20.17
Tunnel 0 to Hub 172.20.20.14
int Vlan 12 to RT2 10.10.9.2
int Vlan 190 to RT2 192.168.10.34
The server is in subnet 192.168.0.16/29
OK - so from rt1 I see no feasible sucessor for the route to 192.168.0.16 from anywhere else other than tunnel0 that goes to the hub.
Can you supply the same outputs from rt2 ??
From looking at what you have sent I see 2 issues:-
1) Both router 1 and router 2 have the same cost to 192.168.0.16 in the eigrp table
2) Router 2 does not see the route from router 1 as a feasbile sucessor - as I am assuming that router 1 is the hsrp master, and router 2 is the standby.
what bandwidth have you configured for tun0 on rt1 and tun0 on rt2? what delay is configured on these tunnels
Ideally what you want is for tunnel 0 on rt1 to be the primary. The tunnel 0 on rt2 as the secondary with a feasbile sucessor seen from rt1. Rt1 is the hsrp master - with rt2 as the standby, so if the tunnel 0 in rt1 goes down, rt2 will distribute the route from tunnel 0 into rt1 as rt1 is the hsrp master. If rt1 goes down completly - all routes a valid.
Yes, RT1 is active & RT2 is the standby. I haven't configured the bandwidth on the tunnels for routers 1 & 2 only increased the delay on tunnel 3 at the hub which points to RT2 thus traffic prefers the route to RT1. I had though that load balancing traffic from the hub may have caused this issue but it still remains.
I will be back at my desk on Monday, as i am on a trip for the next 3 days so i will increase the delay at the tunnel interface on RT2 then & see what happens. If you have any other ideas to try i will have a look at these also.
Thanks for the replies.
I have added the config on both ends of the tunnel & now i can ping the server when the tunnel comes back up. When the tunnel is down on router 1 i still cannot ping the server even though there are routes via router 2. I had increased the delay on one of the vlan interfaces on both routers but still i cannot ping the server. This is not a problem though as router 2 has taken over but i wuold like to understand why this config change worked a far as the tunnel interface is concerned.
OK cool, we have one issue fixed. Can you post the output of:-
show ip route
show ip route static
show ip eigrp int
show ip eigrp top
from all 3 devices?
The issue still remains, when i arrived this morning the tunnel had come back up 1 hour previously, can ping everything else but not the server. When i tested yesterday i shut the tunnel interface, in reality when the tunnel goes down, for what ever reason, the tunnel interface will always be up so this was probably not an accurate test. I have attached the configs you requested.
OK here is what I see:-
1) from rt1 192.168.0.16 is visible from tunnel 0 - great
2) from rt2 192.168.0.16 is visibale from VLAN190 - great, with a feasible sucesssor of tunnel 0 - great
3) from the hub 192.168.10.0 is visible from tunnel 2 (which I presume is the tunnel to rt1?) with a feasible sucessor of tunnel 3 (which I presume is the tunnel to rt2?)
from all that it should be OK. But lets check some more things:-
1) on the HUB router, what is the delay and bandwidth for tunnel 2 and 3?
2) On the rt1 what is the delay and bandwidth for tunnel 0?
3) On the rt2 what is the delay and bandwdith for tunnel 0?
The above can be foun by typing "show int tun #"
4) On rt1 & rt2 what is the delay and badnwidth of the vlan190 interface
The above can be foun by typing "show int vlan 190"
I have pasted traces to the switch connected to the server & the server itself. As you can see the trace to the server never gets past the tunnel. Do you think this may have something to do with the tunnel rather than EIGRP? As i said previously, i can ping the server after the tunnel has been back up after roughly 4 hours without any intervention on my part.
Type escape sequence to abort.
Tracing the route to 192.168.0.18
1 172.20.20.13 88 msec 88 msec 92 msec
2 172.20.21.2 88 msec 88 msec 88 msec
3 192.168.0.18 88 msec 88 msec *
Type escape sequence to abort.
Tracing the route to 192.168.0.19
1 * * *
2 * * *
3 * * *
4 * * *
5 * * *
6 * *
mmmm if there was an issue with the tunnel - you would not really get and IEGRP neighbour, and it would not pass traffic at all! are you using a loopback source and destinations for the tunnels?
Also have to "tweaked" the eigrp timers on the tunnels?? As by default on T1 and below circuit speeds, the EIGRP default hello is 60 seconds, with a hold/dead of 180 seconds. On T1 and above the default hello is 5 seconds and hold/dead 15 seconds. From your posts you did not attach the output from "show ip eigrp nei" This will indicate the dead timers - you might have an issue there.
One thing - do you have any other issues with traffic over the tunnels?? Also why are you running RIP over VLAN190?
1, tu2 bw 9kb del 500000 usec
tu3 bw 9kb del 10000000 usec
2, rt1 tu0 bw 9kb del 500000 usec
3, rt2 tu0 bw 9kb del 10000000 usec
4, rt1 vlan190 bw 100000kb del 100 usec
rt2 vlan190 bw 100000kb del 100 usec
The remote network is concentrating remote sites using satellite communication, the vsat modems used for this only use rip hence the redistribution on the routers as i don't want to run rip over the hub. I have not changed the eigrp timers on any interface.
I agree that a tunnel issue would prevent any eigrp updates & not pass any traffic but everything else is working as it should, very strange.
OK - what I would try is change the EIGRP timers, as since the default will either be 15 or 180 seconds to remove the neighbour and route from the routing table, if the tunnels go down.
I would change the timers to the following:-
ip hello-interval eigrp <
ip hold-time eigrp <
I would also change the BW and Delays to ensure the corretct paths are choosen in the routing table and feasible sucessor. Even though it's OK right now - I would change them to make sure. I would try something like:-
From the HUB - tunnel to rt1
From the HUB - tunnel to rt2
And the same numbers on rt1 and rt2 back to the hub.
I tried adding a static route on rt1 to the servers subnet but i was still unable to ping it, however when i added a static route to the servers ip address i was able to ping the server. I have since shutdown & enabled the tunnel several time & i can now ping the server every time. I have now added this route to rt2. Do you know why this seems to have resolved the issue?
I don't have any duplicate ip addresses & if there is a loop why would it only affect one address?
I am going to leave this overnight because the tunnels always seem to go down at night & check when i get in tomorrow morning.
By adding a static route to the server indicates either an IP address overlap or a loop.
As the static route is more specific to the desintation in the routing table, so that path will always be taken. The fact that when the static route is input indicates that the issue is from rt1 to the hub, as you did not define a static route from the hub to rt1 - is this assumption correct?
if there is a loop a good test would be to make tunnel3 passive in the hub, and tunnel 0 passive in rt2, then remove any configured static routes. See if it works, if it does, then re-enable the tunnels, then make tunnel 2 passive in the hub and tunnel 0 passive in rt1 (the rt2 is the best path to the server) if this works....re-nable the other tunnel and re-test. If this fails without a static route...then there is somekind of loop issue.
The I would suggest you post the relvant routing config from the hub, rt1 & rt2 for review.
I have tried the test you suggested but received all pings. When i initially tried the static route & was then able to ping the server, i then removed the static route but found i was still able to ping server. If there is a loop, do you think enabling the static route would have cleared the loop & remain loop free after removing the route? Only when the tunnel goes down & back up allows the loop to return?
That is a possibility - I have seen in the past EIGRP has an issue that can effect routing...it's called "stuck in active"
And also clearing a specific EIGRP learned route from the routing table.
I could not see from the posts the SIA condition, and not see from the routing tables a persistant route.
If you performed the test, and all is OK. If you currently do not have a static route configured and everything is working.....I would revise my config's. Perhaps as you are using loopbacks for the source and destination of the tunnels, I would check the static routes to the loopbacks are correct.
I never saw SIA either. I am planning on keeping the static routes in place, i would like to keep an eye on it over the next few days to make sure this does not return. The tunnels usually go down early morning so i should know tomorrow morning.
I am not using loopbacks for the tunnels, only the outside fastethernet interface.
I see the tunnels have gone down, can still ping the server & the remote vsat network is stable. I have checked the eigrp topology & all routes are passive. I looked at Rt1 & RT2's logs & have attached the output. It seems to suggest that the ip address in vlan 190 is not in vlan 12 & vice versa, which is true. Do you think there is an issue with this?
They are just trying to form neighbours - and it could have something to do with your issue.
Which ever is the transit link between rt1 & rt2 should be in the EIGRP process. If that is VLAN190 then make VLAN12 passive in process 65000 or vice versa, both VLAN's do not need to be in EIGRP, unless you want failover.