MTU vs tcp adjust-mss

Unanswered Question
Mar 25th, 2008

My server is having trouble exchanging mail with another SMTP server. After much troubleshooting I ran accross an article that states that this might be caused by a "black-hole" router and MTU/packet issues. Sounds weird but I want to give it a try. My questions are:

1) Should I use "ip mtu xxx" or "ip tcp adjust-mss"? What are the diferences between these two commads.

2) Should I apply this to the WAN or LAN of my Internet router?

Thanks,

Diego

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 4.8 (6 ratings)
Loading.
pciaccio Tue, 03/25/2008 - 08:20

The differences between the IP MTU and tcp-adjust-mss is that the MTU expands the IP Packet size to the specific size you specify. The tcp-adjust-mss sizes the segment size of the layer 4 segment to the size you specify. If you adjust the MSS then layer 3 (packet) will add on the standard IP packet header and then the layer 2 will add on the frame header to the packet header from layer 3. You will need to size the segment size to a value that when the packet and frame headers are tagged on then the total value of the frame is 1500. If you have control of your devices end to end then you can get away with setting the MTU size assuming you can control any fragmentation and or firewall parameters that may restrict you. However, If you do not have control or go out on the internet then I would suggest sizing the segment and your traffic will have no issues getting from end to end. If you size the MTU then you may have fragmentation issues and if you pass a device that does not allow fragmentation or congestion is too much then you can tempt faith and possibly lose data. Your best bet is to use MSS...Good Luck...

Paolo Bevilacqua Wed, 03/26/2008 - 06:02

The only problem with the explanation above is that is not correct. Setting ip mtu will never "expand the IP packet". Neither MSS does that.

What IP mtu does, is to set the maximum size of packets that can be sent on an interface. Packets sized bigger that that, will be fragmented.

TCM mss-adjust does something else, that is a bit complicated to explain in few lines, but basically forces the router to adjust the advertised maximum TCP segment size to a smaller value than the one use by computers by default. This is typically needed in presence of PPPoE links.

mbroberson1 Thu, 03/27/2008 - 12:49

I the MSS to be set on the inside, or outside interface of the routers ethernet interface?

Thanks

Joseph W. Doherty Wed, 03/26/2008 - 10:09

You might want to try the following to verify whether you have a black hole router:

yourRouter#ping

Protocol [ip]:

Target IP address: IP Addr of far SMTP server

Repeat count [5]: 1

Datagram size [100]:

Timeout in seconds [2]:

Extended commands [n]: y

Source address or interface:

Type of service [0]:

Set DF bit in IP header? [no]:

Validate reply data? [no]:

Data pattern [0xABCD]:

Loose, Strict, Record, Timestamp, Verbose[none]:

Sweep range of sizes [n]: y

Sweep min size [36]: 1450

Sweep max size [18024]: 1550

Sweep interval [1]:

Type escape sequence to abort.

Sending 101, [1450..1550]-byte ICMP Echos to 10.5.1.1, timeout is 1 seconds:

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

If you see any "." instead of "!", packets are being dropped, which shouldn't happen, they should be fragmented if larger than MTU (note 1550, above).

From most hosts, you can try various sizes. E.g. a Windows host:

C:\windows>ping 10.5.1.1 -l 1550

Pinging 10.5.1.1 with 1550 bytes of data:

Reply from 10.5.1.1: bytes=1550 time=1ms TTL=255

Reply from 10.5.1.1: bytes=1550 time=1ms TTL=255

Reply from 10.5.1.1: bytes=1550 time=1ms TTL=255

Reply from 10.5.1.1: bytes=1550 time=1ms TTL=255

Ping statistics for 10.5.1.1:

Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 1ms, Maximum = 1ms, Average = 1ms

C:\windows>ping 10.5.1.1 -l 1550 -f

Pinging 10.5.1.1 with 1550 bytes of data:

Packet needs to be fragmented but DF set.

Packet needs to be fragmented but DF set.

Packet needs to be fragmented but DF set.

Packet needs to be fragmented but DF set.

Ping statistics for 10.5.1.1:

Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

C:\windows>ping 10.5.1.1 -l 1500

Pinging 10.5.1.1 with 1500 bytes of data:

Reply from 10.5.1.1: bytes=1500 time=1ms TTL=255

Reply from 10.5.1.1: bytes=1500 time=1ms TTL=255

Reply from 10.5.1.1: bytes=1500 time=1ms TTL=255

Reply from 10.5.1.1: bytes=1500 time=1ms TTL=255

Ping statistics for 10.5.1.1:

Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 1ms, Maximum = 1ms, Average = 1ms

C:\windows>ping 10.5.1.1 -l 1500 -f

Pinging 10.5.1.1 with 1500 bytes of data:

Packet needs to be fragmented but DF set.

Packet needs to be fragmented but DF set.

Packet needs to be fragmented but DF set.

Packet needs to be fragmented but DF set.

Ping statistics for 10.5.1.1:

Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

If you find the maximum MTU is smaller than 1500, you can set the MTU on either interface (preferably the WAN interface) to the maximum discovered size and set IP tcp adjust-mss to 40 bytes less. NB: This will also impact all traffic transienting this router.

Does your WAN consist of GRE/IPSec tunnels? The additional encapsulation overhead can exceed interface MTU, especially for applications that use large efficient packets, like printing, FTP, mail, remote desktop, active directory synchronization, etc. (my experience anyway). After fiddling over the years, I've settled on TCP adjust-mss 1354, applied on the LAN side of my routers. I don't use any MTU commands on other interfaces (or tunnel interfaces), just the mss command...this has solved my issues regardless of application or WAN technology (ATM, broadband, frame-relay, ppp, etc.) The MSS command actually initiates an ICMP conversation between the router and transmitting devices on the LAN side, so you can't have a firewall blocking ICMP along the path. You'll be better off getting the routers out of the fragmentation business...do it at the source with this method. You may run into a device that does not respond to the MSS ICMP messages, but I've never experienced one.

Joseph W. Doherty Wed, 03/26/2008 - 13:35

I agree that ip tcp adjust-mss can avoid TCP fragmentation processing, but like Paolo, my understanding of the function of this command is it overwrites the MSS during TCP handshake, i.e. ICMP isn't involved. Perhaps you're thinking of ICMP interaction while supporting PMTU?

PS:

Also, using ip tcp adjust-mss alone, could risk silent drop of large non-TCP packets.

Yeah, you're probably right on both counts there...but I noticed that when putting MTU commands on my tunnel interfaces, I would often get an error saying that the value was greater than the default of 1394, and my rule of thumb for mss was 40 bytes less than MTU, so I came up with 1354 and everything started working. That 40 bytes is for a 20 byte IP and 20 byte TCP header, but admittedly, there's no accounting for the layer-2 encapsulation there...(usually 8 bytes). Life is too short...if anybody knows where this default MTU value of 1394 for tunnels is, lemme know...never found reference to it anywhere.

mbroberson1 Thu, 03/27/2008 - 13:06

Alan,

Do you put the "ip tcp adjust-mss" command on the inside or outside interface of the routers ethernet interface?

Thanks

mbroberson1 Thu, 03/27/2008 - 13:32

Thanks for the reply. I have been eperiencing what I think is the same issue on one os our site to site VPNs...how does this config for my intrfaces look to you?

interface GigabitEthernet0/0

description outside$ETH-LAN$

ip address 210.209.230.5 255.255.255.0

ip nat outside

ip virtual-reassembly

duplex auto

speed auto

media-type rj45

crypto map FMC_CMAP

crypto ipsec df-bit clear

!

interface GigabitEthernet0/1

description inside$ETH-LAN$

ip address 10.10.10.10 255.255.255.0

ip nat inside

ip virtual-reassembly

ip tcp adjust-mss 1300

no ip mroute-cache

duplex auto

speed auto

media-type rj45

Richard Burts Fri, 03/28/2008 - 04:51

Brandon

The article by Ivan for which you posted the link was interesting and helpful. As he explains in his article the ip tcp adjust-mss can be placed on either the inside or outside interfaces or on both. Like Alan I tend to place it on the inside interface and have achieved good success with it there.

I believe that your config looks fine as it applies the adjust-mss on the inside interface. The value of 1300 is a safe value and perhaps a bit conservative. I have frequently found that I could set the value up to 1375 or 1380 (for GRE with IPSec tunnels). As Ivan explains in his article the optimum value will vary depending on some of the parameters chosen in the IPSec configuration. So 1300 will pretty much always work. Sometimes a larger value will also work - you might need to experiment a bit to find the value at which you start to experience problems.

HTH

Rick

mbroberson1 Fri, 03/28/2008 - 05:13

Rick,

Thanks you for the reply. I do appreciate it. I do have a question regarding Ivan's article. Under the section "Network Implications" specifically referring to "Listing 2" "Clear the don't fragment bit for UDP traffic" the config example is this to be applied to the inside or outside interface? Just as an FYI, Cisco says when you have a router that is performing VPN and IPSec encapsulation you want to place the mss-adjust command on the inside interface since when it hits the next interface being the outside extra overhead is added due to encryption...any thoughts or opinions on this as well?

HTH,

Brandon

Richard Burts Fri, 03/28/2008 - 05:37

Brandon

Listing 2 in the article by Ivan is using Policy Based Routing to manipulate the DF bit. PBR is applied on the interface on which the packet arrives, so it would be applied on the inside interface.

I am not sure that I understand the logic of "when it hits the next interface being the outside extra overhead is added due to encryption". Certainly going through the outside interface will add extra header and extra overhead of processing. But I do not see how that would impact tcp adjust-mss. If there is some recommendation to place the adjust-mss on the inside I can be comfortable with that. But I do not see how there is any relationship between placement of the adjust-mss and the processing of encryption. With adjust-mss the router is going to look for TCP packets with the SYN bit and will inject a value into the MSS field. I do not see how the processing of encryption impacts that one way or the other.

HTH

Rick

mbroberson1 Fri, 03/28/2008 - 05:41

Rick,

Thanks for your reply. Your explanation here is a thousand times better than the explanation the Cisco engineer gave me for whom I could hardly understand to begin with. I agree that I don't see any relationship.

HTH,

Brandon

mbroberson1 Fri, 03/28/2008 - 06:17

Rick,

Attached is a screen shot from a wireshark capture. You will notice frame #719 as having 1460 bytes on the wire. I had previously 1420 set as the mss size on the inside interface before changing that to 1300, and before running a ping test of "ping -f -l 1415 x.x.x.x" from the local host that awas having issues sending data. At a size of 1415 I got clean replies from the end system (remote side of the VPN tunnel). I went down to 1300 just to be safe and from Cisco's recommendation. But I will probably change it to 1375 or something closer to your suggestion. I figured with a datagram size of 1460 from the capture and the mss size at 1420 on the router interface, but with clean replies at a byte size of 1415 it clearly looked like the issue was fragmentation since the 1420 mss is higher that the 1415 byte size in the ping test. Any thoughts?

HTH,

Brandon

mbroberson1 Fri, 03/28/2008 - 06:18

Rick,

Attached is a screen shot from a wireshark capture. You will notice frame #719 as having 1460 bytes on the wire. I had previously 1420 set as the mss size on the inside interface before changing that to 1300, and before running a ping test of "ping -f -l 1415 x.x.x.x" from the local host that awas having issues sending data. At a size of 1415 I got clean replies from the end system (remote side of the VPN tunnel). I went down to 1300 just to be safe and from Cisco's recommendation. But I will probably change it to 1375 or something closer to your suggestion. I figured with a datagram size of 1460 from the capture and the mss size at 1420 on the router interface, but with clean replies at a byte size of 1415 it clearly looked like the issue was fragmentation since the 1420 mss is higher that the 1415 byte size in the ping test. Any thoughts?

HTH,

Brandon

Attachment: 
Joseph W. Doherty Fri, 03/28/2008 - 05:51

Some additional thoughts . . .

Inside interfaces work fine for ip tcp adjust-mss placement, but keep in mind all TCP traffic is impacted. If the router is also doing local LAN routing, such traffic will be impacted. Works fine, but throughput might be slightly reduced for such traffic.

With an outside interface, be careful of what the outside interface is. If the outside interface is a tunnel interface, perfect, but if the outside interface is the tunnel's physical interface, you'll likely not obtain the results you desire since the MSS of interest is now encapsulated.

Ip tcp adjust-mss is a great feature in that it avoids initial MTU fragmentation processing, at least for TCP, but it should not be relied upon. What you want to insure is that packets can still transit when larger than MTU (either by correct functioning of PMTUD or physical fragmentation). The reason for this is ip tcp adjust-mss only works, I believe, during the initial TCP handshake. It's possible that MTU along a path changes dynamically while a flow is active.

For instance, you have a branch with a dedicated WAN link and a backup VPN link. The main link fails, traffic now flows across the VPN link but with the reduced MTU which ip tcp adjust-mss won't change for existing flows (again assuming I'm correct it only functions during TCP handshake). Of course, you could also use ip tcp adjust-mss across the dedicated WAN link.

Usage of ip mtu can be used to help insure an ICMP message is sent to the host. This to avoid black hole routers or devices that might hide fragmentation.

mohammedmahmoud Wed, 03/26/2008 - 13:39

Hi,

Kindly check the attached document, i've tried to collect the whole points of MTU, TCP MSS, and PMTUD in a single document before, i feel it is still incomplete, but i believe that it will at least guide you in.

BR,

Mohammed Mahmoud.

Paolo Bevilacqua Wed, 03/26/2008 - 14:08

Mohammed, do you recently completed the ccie ?

Congratulations! I'll revise your document as soon time permits.

mohammedmahmoud Wed, 03/26/2008 - 14:16

Hi Paolo,

Thank you very much, i've accomplished my CCIE last November, thats why i wasn't active on the forum during most of 2007. I await your criticism and additions to my document.

BR,

Mohammed Mahmoud.

DIEGO ALONSO Wed, 03/26/2008 - 16:44

I do have GRE/IPSec tunnels but the problem is sending email to a particular domain not traffic between private hosts. The traffic in this case is not carried via GRE/IPSec at any time. It is delivered to the destination via a standard T1 circuit to our ISP.

There is ONE domain that deliverying mail to is constantly failing. Since we deliver mail to hundreds of other domains I don't think its our email server. Of course the other guys get email from hundreds of servers without problems either so the problem isn't on their end either. Since the problem seems to be "in the middle" I ran accross the MTU thing while researching.

Right now I used a registry tweak to lower the MTU on my email server to 1350. If that doesn't work I will do the MSS thing on the LAN side of my router like you suggested.

Thanks,

Diego

Wow, sorry for the diversion...but I do suggest following up with the MSS stuff as discussed in this thread...a must for GRE/IPSec. Your problem smacks of DNS...like the DNS servers you use cannot find the MX entry of the destination domain...but I'm sure you've checked all the obvious stuff. We just went round and round with a similar issue and actually moved the damn Exchange server to another site...! Some fix.

jedavis Fri, 04/11/2008 - 08:08

I too have been struggling with this problem recently. I have 2 sites at which I manage the network. In the middle of this connection is a VPN that I don't control. I noticed that there was a lot of fragmentation going on, and that Windows hosts were not setting the DF bit thus allowing fragmentation.

After a bit of experimentation, here is what I found:

packet <= 1400 bytes, no fragmentation required.

packet >= 1401 bytes and <= 1476 bytes, packet is silently dropped

packet >= 1477 bytes, “ICMP frag needed but df bit set” returned.

As a work-around, I set tcp adjust-mss on the remote site router interface (facing the VPN, not the LAN) to 1360. However, I think the correct thing to do is fix the problem with PMTUD. I have been working with the person responsible for that equipment, though I don't have access to the configurations.

Due to the fact that I am receiving the unreacables at packet sizes > 1476, I would infer that GRE is being used in conjunction with IPsec transport mode. But what does this behavior tell me? Is tunnel path-mtu-discovery not configured on the GRE tunnel?

mbroberson1 Fri, 04/11/2008 - 10:03

Use "ip tcp adjust-mss" try something like 1375 first. If this works slowly increase the size until it breaks, then back off by 20 or so to allow for some cushion. Apply this closet to the source traffic. I would first try the inside LAN side interface of the router.

jedavis Fri, 04/11/2008 - 12:53

Thanks for the reply, but I think you misunderstood the question. I already have adjust-mss working. I have it set on the outside interface, not the LAN interface. I agree with josephdoherty on that count. My point is that i view the adjust-mss measure as just a work-around for a misconfigured network. I believe that the correct solution is to fix the network so that the individual components respond correctly to oversized packets.

Some other things that confuse me about PMTUD. The Cisco documentation that I have read seems to suggest that end hosts typically store path MTU information as a host route. However, I have yet to find anywhere that I can display this information on Windows hosts. I certainly can display the routing table, but I don't see host routes using "route print". Does anyone know how to display discovered path MTU values on Windows hosts?

It also seems to suggest that IOS stores the Path MTU not as a host route but as an interface-wide parameter, but that it can't be displayed unless you use "debug tunnel". Is this true or am I misunderstanding the documentation? It certainly seems possible that different hosts that can be reached through a tunnel interface could have different path MTUs.

One final bit of fog floating around my brain on this issue. While PMTUD is done by TCP only, fragmentation is done at the IP layer. Does the path MTU discovery done by TCP affect other protocols? I mean, if TCP has determined that the path MTU for a particular host exists with a value of say, 1400, would the IP stack allow a UDP packet go out larger than this?

Richard Burts Fri, 04/11/2008 - 13:08

Jeff

I can understand your wanting to get the network correctly configured so that tcp adjust-mss is not needed. But there is a fundamental problem about that. Much of our traffic goes through networks that we do not control. And many of those networks do things that break PMTUD (especially networks that block the ICMP error message that is essential to the functioning of PMTUD). So you can get all the devices in your network correctly configured and working correctly, but you are still at the mercy of devices in other networks which will sometimes break PMTUD.

To answer the other part of your question - there is no connection between what the IP stack of a host does with PMTUD for TCP and what it does for UDP. It may have determined the size to be 1400 for TCP but it will happily send out UDP packets larger than this.

HTH

Rick

mbroberson1 Mon, 04/14/2008 - 04:10

Rick,

Great response! Before I read this post reply I was think about the networks beyond control, those with lower MTU values, differnt circuit (technologies) types and such.

This seems to be a fairly interesting topic to many. I guess that can be so when things are so clear in documentation or conflicting docs.

Thanks,

Brandon

Actions

This Discussion