Solved: GRE Tunnel Flapping While Both Ends Are Pingable

Chuan Liu · ‎11-26-2009

Hi,

My topology is as below:

C2811 EthernetPort(10.250.0.46)--->(10.250.0.33)MPLS Cloud--->(10.250.0.9)C877 Adsl port.

I have 30 GRE tunnels running between this C2811 and 30 C877 in the past 2 years. Only recently, one of them is experiencing flapping. I have made no changes in the network, but not sure if the provider has changed anything: Every a few hours, the remote site loses connectivity to the HQ. It comes back after about 30 minutes. I removed the RIP on this tunnel and use static routes, changed MTU, upgrade IOS on C877 etc. The problem is still there.

The strange point to me is that when the tunnel is down, I can still telnet between the two tunnel source and destination addresses, which elimiates the routing issues in the MPLS cloud.

What possibilities should I follow to troubleshoot this? Any advice is appreciated.

Thanks in advance.

---------------------------------------------

HQ C2811:

interface Tunnel9
bandwidth 512
ip address 10.250.0.142 255.255.255.252
ip mtu 1400

tunnel source 10.250.0.46
tunnel destination 10.250.0.9
!
interface FastEthernet0/1
ip address 10.250.0.46 255.255.255.240

!

ip route 10.8.56.0 255.255.255.0 Tunnel9-------------Reach remote site via Tunnel

ip route 10.250.0.9 255.255.255.255 10.250.0.33-------------Reach Tunnel remote end via next hop

Remote C877:

interface Tunnel1
ip address 10.250.0.141 255.255.255.252
ip mtu 1400
tunnel source 10.250.0.9
tunnel destination 10.250.0.46
!

interface Dialer0

(assigned ip add 10.250.0.9)

!

ip route 0.0.0.0 0.0.0.0 10.250.0.142--------------reach HQ via Tunnel remote end.
ip route 10.250.0.46 255.255.255.255 Dialer0-------------Reach Tunnel remote end via Dailer0

-----------------------------------------------

Peter Paluch · ‎11-27-2009

Hello,

Just one more comment. I just went over the configuration of your remote CPE more closely, and I have a comment regarding your tunnel interface configuration.

The Tunnel1 interface on the remote CPE is configured with the command tunnel source 10.250.0.9. However, this IP address is not configured explicitly on any other interface. It is obviously assumed that the Dialer0 interface will be assigned this IP address via PPP IPCP negotiation (the ip address negotiated command on the Dialer0). This will work if the Dialer0 is indeed assigned this IP address. However, if by any accident the Dialer0 has a different IP address assigned then the tunnel source address basically does not exist on your router, and as a result, the tunnel interface will go down.

When the tunnel flaps again, apart from doing the debug ip routing as I suggested in my previous reply, have a look also on the IP address assigned to the Dialer0 interface if there is any and if so, whether it is the 10.250.0.9.

Best regards,

Peter

View solution in original post

spremkumar · ‎11-26-2009

Hi

As far as tunnel flap issue i strongly feel it might be due to instable dialer interface caused by the ADSL link.

Have you enabled any sort of logging in your router to check whether your adsl link or your dialer flaps frequently which might cause your tunnel to flap as well.Also can you post the output of show ip route and the full configuration doc relevant to these sites?

regds

Chuan Liu · ‎11-27-2009

Hi,

Thanks for the response.

The tunnel drops do not coincide with DSL link disconnects. When the tunnel is down, I can still telnet into the router from the HQ router by using the tunnel destination address. Logging does not show any more information than interface up/down. During last down time (about 30 minutes), I telneted into the router and turned on 'debug ip icmp'. Then I ping the loopbak address from the HQ router. This traffic should go through the tunnel as a static route points to the remote end tunnel address. The debug showed icmp packets arrived at the remote router tunnel interface, but replies went out dialer interface instead of the tunnel interface. So this seems to be a dialer interface issue? Pinging from the remote router to the HQ site did not arrive at the HQ router. What command can I use to log any information?

The configuration is simple as in the previous post. I have 30 sites with the similar configurations. This site has been working fine for 2 years. The flapping has just happened since a few day ago. Can this be a hardware issue?

One thing I have noticed is that during office hours, there are more frequent tunnel outages than in after hours. This might be associated with traffic load in the Provider network.

Best regards.

HQ router:

-----------------------------------------------------------------------

!
interface Loopback0
ip address 10.255.1.1 255.255.255.0
!
interface Tunnel1
ip address 10.250.0.134 255.255.255.252
tunnel source 10.250.0.46
tunnel destination 10.250.0.1
!
interface Tunnel2
ip address 10.250.0.138 255.255.255.252
tunnel source 10.250.0.46
tunnel destination 10.250.0.2
!
interface Tunnel3
ip address 10.250.0.166 255.255.255.252
tunnel source 10.250.0.46
tunnel destination 10.250.0.3
!
.
.
.
.
.
!
interface Tunnel9
bandwidth 512
ip address 10.250.0.142 255.255.255.252
ip mtu 1400
tunnel source 10.250.0.46
tunnel destination 10.250.0.9
!
interface FastEthernet0/1
ip address 10.250.0.46 255.255.255.240
!
ip route 10.8.56.0 255.255.255.0 Tunnel9
ip route 10.250.0.9 255.255.255.255 10.250.0.33
ip route 10.255.4.1 255.255.255.255 tunnel9

Remote router:

--------------------------------------------------------------------

interface Loopback0
ip address 10.255.4.1 255.255.255.0
!
interface Tunnel1
ip address 10.250.0.141 255.255.255.252
ip mtu 1446
tunnel source 10.250.0.9
tunnel destination 10.250.0.46
!
interface ATM0
no ip address
no atm ilmi-keepalive
dsl operating-mode auto
!
interface ATM0.1 point-to-point
pvc 0/100
tx-ring-limit 3
encapsulation aal5mux ppp dialer
dialer pool-member 1
!
!
interface FastEthernet0
description End-users
switchport access vlan 20
!
interface FastEthernet1
description End-users
switchport access vlan 20
!
interface FastEthernet2
description End-users
switchport access vlan 20
!
interface FastEthernet3
description End-users
switchport access vlan 20
!
interface Vlan20
description Data Vlan
ip address 10.8.56.1 255.255.255.0
ip helper-address 10.1.100.10
ip helper-address 10.1.114.10
!
interface Dialer0
ip address negotiated
no ip redirects
no ip unreachables
ip mtu 1470
ip virtual-reassembly
encapsulation ppp
no ip route-cache cef
no ip route-cache
no ip mroute-cache
dialer pool 1
dialer-group 1
no cdp enable
ppp pap sent-username xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
ppp ipcp dns request
!
ip route 0.0.0.0 0.0.0.0 Tunnel1
ip route 10.250.0.46 255.255.255.255 Dialer0
!
dialer-list 1 protocol ip permit
!

Peter Paluch · ‎11-27-2009

Hello,

Let's assume for now that we are not dealing with any IOS bug.

If I am not mistaken, you are not using the GRE keepalives. In this case, the state of tunnel interface will be up/up if all these conditions are met:

The source interface (tunnel source) must be up/up and have an IP address assigned
The destination IP (tunnel destination) must be resolvable in your routing table.

Some of these conditions must have failed and as a result, the tunnel interface was brought down.

Can you afford running the debug ip routing on the router on which the tunnel interface flaps? This command produces output of all changes to the routing table. Perhaps there is something that will narrow down what is going on.

Best regards,

Peter

Chuan Liu · ‎12-01-2009

Hi Peter,

You are right that this was not a hardware issue. The C877 behaved the same. I disabled the 'keepalive' because there is no alternative route to the destination, and for debugging purpose, I can see packets still coming in the tunnel and going out the tunnel.

For testing purpose. I created GRE tunnels between this remote site and some other sites. The same behavor was observed.

I also tried creating IPSec tunnel between these 2 end points. The behavior was exactily the same as it was with GRE tunnel. Every 1 or 2 hours, user traffic from the remote site to the HQ got dropped in the cloud, even though the tunnel itself shows ok. When I ping or telnet through the IPSec tunnel from the HQ site, I could see on the remote router both the encryp and decrypt counters increasing, but HQ router only showed encryp increasing. This was the same when a GRE tunnel was used: user traffic using the tunnel got dropped in one direction somewhere in the provider cloud.

The provider said they did not do any filtering for this traffic. What they provide is a MPLS VPN for us.

The customer could not afford any more disconnections. As the last resort, I requested the provider to advertise the LAN subnets from the this remote site. I removed the tunnel configurations and are using default route on the remote router pointing to the provider, and using static routes on the HQ router pointing to the provider. This is up and running for 14 hours without any drops.

In conclusion, the issue was one-way packet drop in tunnels from the remote router to the HQ router. No direct solution is found. The workaround is to not use tunnels.

This is really strange experience. I still have 29 sites usning tunnels with no such problems.

Thanks for all the help you provided.

Peter Paluch · ‎11-27-2009

Hello,

Just one more comment. I just went over the configuration of your remote CPE more closely, and I have a comment regarding your tunnel interface configuration.

The Tunnel1 interface on the remote CPE is configured with the command tunnel source 10.250.0.9. However, this IP address is not configured explicitly on any other interface. It is obviously assumed that the Dialer0 interface will be assigned this IP address via PPP IPCP negotiation (the ip address negotiated command on the Dialer0). This will work if the Dialer0 is indeed assigned this IP address. However, if by any accident the Dialer0 has a different IP address assigned then the tunnel source address basically does not exist on your router, and as a result, the tunnel interface will go down.

When the tunnel flaps again, apart from doing the debug ip routing as I suggested in my previous reply, have a look also on the IP address assigned to the Dialer0 interface if there is any and if so, whether it is the 10.250.0.9.

Best regards,

Peter

Chuan Liu · ‎11-29-2009

Thanks Peter.

The address 10.250.0.9 is always correctly assigned by the provider. This is a remote site router. I can telnet into the router from the HQ router by using this address even when the tunnel is off. That means, The tunnel source and destination is always reachable all the time. 'debug ip packet detail' on the remote router shows traffic comes in the tunnel interface and response goes out the tunnel as well. But the response did not arrive the HQ tunnel interface. This seems that the GRE traffic in the direction from remote to HQ is dropped somewhere in the cloud. The tunnel may take upto 1 hour to come back automatically. But if I reload the router, it comes back immediately.

This leads me to suspect of the hardware ( I have already changed a IOS). I am going to replace the router and see what happens. I will post the results latter.

Thanks again for your support.

Larry

Peter Paluch · ‎11-29-2009

Hello Larry,

Yes, you are correct - if you can telnet from the HQ router to the remote router's IP address even when the tunnel interface is down then the IP reachability should not be an issue.

I am somewhat reluctant to accept that this might be a hardware issue. The tunnel interface itself is a software interface. Its status is related to meeting the criteria I have mentioned somewhere in my previous post. Nevertheless, give it a try and please let us know.

Still, I would be pretty interested in the debug ip routing output (not the 'debug ip packet').

Oh, and this is interesting:

'debug ip packet detail' on the remote router shows traffic comes in the tunnel interface and response goes out the tunnel as well. But the response did not arrive the HQ tunnel interface.

Let's make one thing clear: is your tunnel interface in state down, line protocol down and is this what you are trying to solve, or is your tunnel interface actually in the up, line protocol up state and you see the actual GRE packets going out your remote router but not eventually arriving at the HQ router?

Best regards,

Peter

tomasnohejl · ‎04-09-2020

Hi All,

I know that this thread is quite old. But I googled it while I was solving similar problem.

P2P GRE tunnel with IPSEC protection which is going sometimes up/down (through this GRE tunnel, there is BGP peering configured) , but the IP connectivity between those 2 endpoints is OK, so why is the tunnel flapping?

I discovered this useful doc - https://www.cisco.com/c/en/us/support/docs/ip/generic-routing-encapsulation-gre/118361-technote-gre-00.html

And there is very handy information there:

=======

GRE Tunnels with Tunnel Protection

In Cisco IOS Software Releases 15.4(3)M/15.4(3)S and later, the GRE tunnel line protocol state will follow the IPsec Security Association (SA) state, so the line protocol will remain down until the IPsec session is fully established. This was committed with Cisco bug ID CSCum34057 (initial attempt with Cisco bug ID CSCuj29996 and then backed out with Cisco bug ID CSCuj99287).

=======

Unfortunately, I don't have access to those bugIDs, but this information helped me to understand the flapping of my GRE tunnel.

That GRE tunnel is the backup GRE tunnel, so there is only BGP traffic there. In case the lifetime of IPSEC SA become out, the tunnel goes DOWN. Bacause there is no normal traffic there (only BGP which is sedning hello in keepalive time), the IPSEC SA is not reinitiated immediately but in the moment, when BGP packet wants to be sent through the tunnel - in this moment, the tunnel is going UP again (because the IPSEC SA is setted up). This repeats each time when the lifetime of IPSEC SA is gone.

Hope this informations will help someone going around.

Regards

Tomas