File Transfers Dropping over site to site VPN tunnels (GRE ove IPSec)

Unanswered Question
Sep 17th, 2010
User Badges:

We have an issue on our network that when we transfer files from site to site using windows the transfer drops and we get the error message "network not available". It doesn't happen all the time but 9 times out of ten it does! Some tunnels work better than others. We currently have 4 sites.


I have posted on another forum. We have tried many things but no one has been able to resolve the issue so far. Please check my other post for an update.


http://www.networking-forum.com/viewtopic.php?f=35&t=19354&p=120181#p120181

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Marcin Latosiewicz Fri, 09/17/2010 - 05:07
User Badges:
  • Cisco Employee,

James,


I had a quick look at your post on the other forums.


I would say please start by checking accelerator statistics specifically to see if any ppq full was reported.


I didn't go over the captures. You mention a RST coming, where is it coming from, how long does it take for the RST to pop up?


More importantly:

1) When has the problem started?

2) Are there protocols (http, ftp) which are not affected?

3) Any firewalls or tcp accelaration technology in topology?


Marcin

jwhite100 Fri, 09/17/2010 - 05:45
User Badges:

Thank you for you quick response.


How do I check the accelerator statistics? I have not heard of this before.


For protocols we are mainly using SMB and FTP. I haven't seen so many errors with FTP but I have kicked one off now just to test.


The problem has been there since day one when we swiched from Microsoft ISA server to Cisco for our site to site VPNs.


We don't have any firewalls and as far as I know we don't have any tcp accelaration technology.


For the packet capture - The RST occurs right at the end when the packet drops. First you get a load of tcp dups

joyride_us2 Fri, 09/17/2010 - 05:11
User Badges:

try to decrease your keep-alive timer on the IPSEC tunnel.

jwhite100 Fri, 09/17/2010 - 07:07
User Badges:

I don't have a keepalive timer set which setting shall I start with?

Lei Tian Fri, 09/17/2010 - 05:30
User Badges:
  • Cisco Employee,

Hi James,


Put 'crypto ipsec df-bit clear ' on both ends, and see if that helps.


HTH,

Lei Tian

jwhite100 Fri, 09/17/2010 - 07:10
User Badges:

Hi Lei,


We did have a routing policy set before which had no effect. The command used was:


ip policy route-map CLEAR-DF-BIT


Does the 'crypto ipsec df-bit clear' have the same effect?

Lei Tian Fri, 09/17/2010 - 07:20
User Badges:
  • Cisco Employee,

Hi James,


Don't know how's your route-map configured and where the PBR applied on. 'crypto ipsec df-bit clear' is to clear the DF bit before packets got encrypted, and the the PBR is to clear the DF bit before packets enter the interface.


HTH,

Lei Tian

jwhite100 Fri, 09/17/2010 - 09:57
User Badges:

I Just applied the "crypto ipsec df-bit clear" across all router tunnels. After inital tests it doesn't seem to be working. Transfers to 2 sites dropped staight away. Transfers to one site always go through OK. This is the site with the slowest link. I don't know if this has anything to do with it. The link doesn't get used as much either.

Lei Tian Fri, 09/17/2010 - 10:05
User Badges:
  • Cisco Employee,

Hi James,


I am little bit confused. On the initial post you said, 9 of 10 time it doesn't work. Now you said transfer to 1 site always work. Can you post a simple diagram and the config for the router in question? Maybe that will be easier for people to help you on the troubleshooting.


HTH,

Lei Tian

jwhite100 Sat, 09/18/2010 - 07:33
User Badges:

This is what my colleague posted on the other forum. The Tampa and Reading sites are not included. Tampa is the only tunnel which works fine at the moment. We only have one member of staff there at the moment so it doesn't get that heavily used. This is what makes me think that it has something to do with bandwidth. The command you gave me is on there now I don't have time to get all the configs now. On Monday when I am back at work I will repost the most up to date configs for you.




Scenario
Cisco 2901 w VPN hardware Module (10Mb Lease line) to Cisco 7201 without module, so software (1Gb transit) is failing pretty quick :-( both routers have low CPU  history


Cisco 2901= We have two tunnels to the same network via 2  different routers on two different entry points.
vol-gateway#sh run inter tunn 1
Building configuration...


Current configuration : 273  bytes
!
interface Tunnel1
description Volume to Blue Square House
ip address 10.0.0.5 255.255.255.252
ip mtu 1400
ip tcp adjust-mss  1300
tunnel source 212.*.*.157
tunnel destination 95.*.*.1
tunnel path-mtu-discovery
tunnel protection ipsec profile VolumeVPN
!
end


vol-gateway#sh run inter tunn 2
Building  configuration...


Current configuration : 288 bytes
!
interface  Tunnel2
description Volume to Blue Square 3
ip address 10.0.0.9 255.255.255.252
ip mtu 1400
ip tcp adjust-mss 1300
ip ospf cost 2000
tunnel source 212.*.*.157
tunnel destination 95.*.*.2
tunnel path-mtu-discovery
tunnel protection ipsec profile VolumeVPN
!
end


----------------------------

Cisco 7201 - Blue Square  House

bsh-r1#show run inter tunn 0
Building  configuration...


Current configuration : 287 bytes
!
interface  Tunnel0
description Blue Square House to Volume
bandwidth 10000
ip address 10.0.0.6 255.255.255.252
ip mtu 1400
ip tcp adjust-mss  1300
tunnel source 95.*.*.1
tunnel destination 212.*.*.157
tunnel path-mtu-discovery
tunnel protection ipsec profile VolumeVPN
end



Cisco 7201 - Blue  Square 3
bs3-r1#sh run inter tunn 0
Building  configuration...


Current configuration : 303 bytes
!
interface  Tunnel0
description Blue Square 3 to Volume
bandwidth 10000
ip address 10.0.0.10 255.255.255.252
ip mtu 1400
ip tcp adjust-mss 1300
ip ospf cost 2000
tunnel source 95.*.*.2
tunnel destination 212.*.*.157
tunnel path-mtu-discovery
tunnel protection ipsec profile VolumeVPN
end

Lei Tian Sat, 09/18/2010 - 09:24
User Badges:
  • Cisco Employee,

Hi James,


Bandwidth could be the issue; you can confirm that by sending small amount of data to sites, see if it work.


Besides of bandwidth, you might also want to make sure there is no asymmetric routing. I see on Volume, you are modifying the ospf cost to prefer bs3-r1 as the primary path. Depends on your setup, if Square House uses bsh-r1 as the primary path to reach Volume, then that is a asymmetric routing, which will cause problem on ipsec.


HTH,

Lei Tian


edit: ignore that, this shouldn't cause this problem.

jwhite100 Sat, 09/18/2010 - 13:12
User Badges:

We ended up shutting down the second route to the Blue Square Data center. My colleague did a Tracert to one of the servers at Blue Square and it was taking a longer route, but I am not sure exactly what was happening. The previous network manager had the bandwidth command set on the tunnels which I understand doesn't actually set the bandwidth - it is only used by EIGRP to find the best route.


Is there someway you can limit the bandwidth on tunnels? Would QoS help? We are thinking of implementing this.


The other day I sent 2gb from Tampa to the volume (the hub site) and it went though fine. I don't think the size of the file matters because some times the file transfers drop immediately where other times they get to about half way. Using FTP is more reliable than using SMB in Windows so we have advised staff to use this as a temporary measure.


This is the bandwidth of the sites:


Volume (the hub)                         10 mb

Vee                                                7mb

Blue Sqaure Data Center              10 gb (I think)

Tampa                                           1mb


Bluesquare is the worst for dropping. Tampa works fine and Vee is intermittent.


I installed Cisco Configuration Professional on Fri and was monitoring the tunnels on the the hub router. The Blue square monitor seemed to be going up and down all the time, where as Vee was running steady at half way. The Tampa shot to the max when I did the the transfer and stayed steady till the transfer was done. The protocols monitor was showing IPSec as taking 1/3 of the bitrate I think. I hope this information helps


Thanks

Lei Tian Sat, 09/18/2010 - 13:26
User Badges:
  • Cisco Employee,

Hi James,


That information definatly helps. Now, my question is what is the traffic direction, is it from Blue Sqaure to Volume? Does the application have the ability to adjust transmit rate based on congestion of the link?


When you say "Blue square monitor seemed to be going up and down", do you mean the tunnel interface on blue square going up and down?


Regards,

Lei Tian

jwhite100 Sat, 09/18/2010 - 14:37
User Badges:

Hello Lei,


Yes I mean the tunnel interface monitor. At Blue Square we have our webservers and various other servers. The developers deliver a lot of files to this site so the traffic is mainly from Volume to Blue Square. For my testing I was dragging and dropping some iso files from my c drive to all the sites. The 2gb transfer I did the other day was from Tampa to Volume. What I meant from going up and down is that Blue Sqaure seems to be in constant use where as when I wasn't transferring from Tampa the activity was minimal and Vee was about half way. When I was transferring to Blue Square the monitor would shoot to the top for a few secs then drop and at the same point my transfer would fail.


What do you mean by "Does the application have the ability to adjust transmit rate based on congestion of the link"? Which application are you reffering to? CIsco CDP or Windows itself? I don't know how to use CDP well yet I have only just started using it.


Thanks for your help

paulbrazier Mon, 09/20/2010 - 02:32
User Badges:

Good morning all,


I am working with James on this issue and though I would share some more information with you.


Attached is a diagram our prevoius network manager created of the site tunnels, this should help you understand the layout.


We have 4 sites:


1. Primary hub - Wokingham, UK 10Mb fibre leased line

2. Reading, UK (VEE) 7Mb fibre lease line

3. Tampa, US 1Mb Serviced office

4. Maidenhead, UK 1Gb ehternet transit provided - Site has 2 * 7201's on different external networks and participate in BGP with our own RIPE address space.


We have been left to manage a fairly complex setup with failry minimal knowledge cisco knowledge :-)


------------------------------------------

New findings
On my latest finding I have disabled a link to Blue Square House (seconhdary tunnel to Maidenhead) so we are linking from Wokingham to that site using Blue Square 3.


Now becuase BSH is the active BGP router i.e. announcing our address space we will be routing through this to get to BS3 internal IP to kick off the tunnel, this seems wrong to me?


However all other site are not this complex i.e. 2 routers and BGP and we still have issues there.


I believe James is looking into keepalives as this is something we do not use.


Any other info required please shout.


Many thanks for all your help so far.


Regards


Paul

Attachment: 
Marcin Latosiewicz Mon, 09/20/2010 - 06:21
User Badges:
  • Cisco Employee,

Paul, James,


I have not been following on both of the threads fully, so forgive me if I'm asking for something already discussed/checked.


I beleive:

- Keepalives are ment as isakmp keepalives - and they were not configured as far as I understand? "show run | i keep" will show you.

Them not being enabled will basically say that tunnel itself does not drop (even tho connectivity MIGHT be impacted between IKE sites)

- if it's GRE keepalives you're looking for - they are not supported with tunnel protection.


My eariler suggestion was to check "show crypto engine accel stati" during the issue to see if there is something wrong there.


I would advise a simple and effective test during scheduled window - disable encryption on both sides and check if the issue perists.

If the issue does not persist without crypto (tunnel protection) it's either crypto accelarator problem or a hidden MTU problem.


What is the maximum ICMP packet size you can ping through the GRE tunnel with Df-bit set?

Remember that ip policy will not impact locally generated packets. (unless it's "ip local policy ...")


Marcin

Lei Tian Mon, 09/20/2010 - 07:38
User Badges:
  • Cisco Employee,

Hi,


I have looked the trace file. was it captured between Volume to Blue Square? The traffic direction is from 172.16.8.1 to 172.16.33.100, is 172.16.8.1 in Volume?


Based on the trace, I can see some packets been dropped from 172.16.8.1 to 172.16.33.100, which cause lot retransmissions. Probably you can identify where does the packet been dropped first. It could be dropped by Router interface, or vpn card, or provider. After we know where the packet been dropped, it will be easier to find the solution.


HTH,

Lei Tian

paulbrazier Mon, 09/20/2010 - 08:25
User Badges:

Hi Marcin,


VOL-GATEWAY#show run | i keep
crypto isakmp keepalive 3600
no keepalive
no keepalive



I believe there are a few ways to setup GRE ipsec protected tunnels, the guy who did ours used tunnel protection, however on a test enviroment using Cisco Professional wizzard (sorry) it seemed to use crypto maps?


VOL-GATEWAY#show crypto engine accel stati


Device:   Onboard VPN
Location: Onboard: 0
        :Statistics for encryption device since the last clear
         of counters 1475003 seconds ago
              131135714 packets in                   131135704 packets out
            83412267287 bytes in                   83024152649 bytes out
                     88 paks/sec in                         88 paks/sec out
                    452 Kbits/sec in                       450 Kbits/sec out
               69454566 packets decrypted             61681138 packets encrypted
            47919849488 bytes before decrypt       35486928075 bytes encrypted
            45333650429 bytes decrypted            37690502220 bytes after encrypt
                      0 packets decompressed                 0 packets compressed
                      0 bytes before decomp                  0 bytes before comp
                      0 bytes after decomp                   0 bytes after comp
                      0 packets bypass decompr               0 packets bypass compres
                      0 bytes bypass decompres               0 bytes bypass compressi
                      0 packets not decompress               0 packets not compressed
                      0 bytes not decompressed               0 bytes not compressed
                  1.0:1 compression ratio                1.0:1 overall
                Last 5 minutes:
                  17819 packets in                       17819 packets out
                     59 paks/sec in                         59 paks/sec out
                 236369 bits/sec in                     237824 bits/sec out
                2335953 bytes decrypted                6137127 bytes encrypted
                  63133 Kbits/sec decrypted             165868 Kbits/sec encrypted
                  1.0:1 compression ratio                1.0:1 overall

----------------------------

On one of the links I have removed tunnel protection and copied files over, did seem a little better however still failed!


C:\Users\paulb>ping -f -l 1372 172.16.33.100


Pinging 172.16.33.100 with 1372 bytes of data:
Reply from 172.16.33.100: bytes=1372 time=8ms TTL=125
Reply from 172.16.33.100: bytes=1372 time=8ms TTL=125
Reply from 172.16.33.100: bytes=1372 time=8ms TTL=125
Reply from 172.16.33.100: bytes=1372 time=8ms TTL=125


Ping statistics for 172.16.33.100:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 8ms, Maximum = 8ms, Average = 8ms


C:\Users\paulb>ping -f -l 1373 172.16.33.100


Pinging 172.16.33.100 with 1373 bytes of data:
Reply from 10.1.255.1: Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.


Ping statistics for 172.16.33.100:
    Packets: Sent = 4, Received = 1, Lost = 3 (75% loss),


Thanks for your post!

paulbrazier Mon, 09/20/2010 - 08:33
User Badges:

Hi Lei


Correct the trace was from Volume 172.16.0.0/20 to Bluesqaure 172.16.33.0/24


Do I just try an position my laptop on different areas of the network to check where the packets are dropped?


We have an ISP managed router in Volume and we buy transit at bluesquare with our own routers so will be easier to monitor that end.


Have not done too much with packet tracing on a switched network any pointers welcome


Thanks


Paul

Lei Tian Tue, 09/21/2010 - 03:55
User Badges:
  • Cisco Employee,

Hi,


Can the provider help to test the link?


Here is a SPAN configuration guide for 3560, other cisco switches use similar syntax. You can use SPAN to trace packets on your switch. The initial trace file you provided missed lot packets, not sure if it is sniffer's filter problem, a lot packets are not seen on either end.


Regards,

Lei Tian

paulbrazier Tue, 09/21/2010 - 04:59
User Badges:

Thanks Lei,


FYI the filter was applied after the capture on the TCP stream so I believe that packets were being dropped.


I shall look a little more into SPAN have heard it being talked about before.


Thanks


Paul

jwhite100 Mon, 09/20/2010 - 11:21
User Badges:

I've just done a bit of testing. It seems that at present transfers are only going in one direction. Please have a look at my findings:




DateTimeSizeResultNotes
To BSQ20/09/20105pm121MBFailed
To BSQ20/09/20105.10pm121MBFailed
From BSQ20/09/20105pm58MBOK
From BSQ20/09/20105.10pm388MBOK
To Vee20/09/20105.30pm121MBFailed
From Vee20/09/20105.36pm258MBOK
To Tampa20/09/201005/01/1900121MBFailed
From Tampa17/09/20104pm2.5GBOK
From Tampa20/09/20106pm16MBOK
From Tampa20/09/20106.10pm130MB
To Tampa20/09/20106.117MBOK

Actions

This Discussion