MPLS MTU problem

djimmy1979 · ‎09-26-2007

Hey Guys,

We currently have several remote sites that connect back to our core (7206vxr) via p2p t1's. One Site connects back to us via MetroE. All Sites are setup under 1 vrf. The sites that have t1's can ping each other with no problem even bumping the datagram size up to over 1000. The problem I'm running into is with the MetroE site. If I try and ping it or ping from the site and bump the datagram size up to 250 or higher the success rate goes down to about 30%. This in turn seems to be causing some issues with our clients. Any advice on this would be great. The setup is all the interfaces that connect to the remote sites are in the same vrf. We then run EIGRP and route the subnets needed and everything works great besides a few applications that seem to be related to something with the MetroE,MPLS, and MTU size, I just not sure where to start.

Thanks in Advance

william.caban · ‎09-26-2007

Try pings with the "DF" bit on to discover at what MTU size you are having the problem.

(From the router you may use the extended commands. From a Windows machine use the "-f" flag or from *nix use the "-M do" flag.)

With this you should be able to rule out if there are MTU issues or not.

-W

djimmy1979 · ‎09-26-2007

Ok I tried this 3 times and upped the DG size by 100 each time. At 300 I started getting packet loss. The thing I don't understand is when setting the "Set DF bit in IP header" to yes I don't notice any differnce than when it's set to No.

TX-OPT-RTR#ping

Protocol [ip]:

Target IP address: 192.168.0.254

Repeat count [5]: 10

Datagram size [100]:

Timeout in seconds [2]:

Extended commands [n]: y

Source address or interface:

Type of service [0]:

Set DF bit in IP header? [no]: y

Validate reply data? [no]:

Data pattern [0xABCD]:

Loose, Strict, Record, Timestamp, Verbose[none]:

Sweep range of sizes [n]:

Type escape sequence to abort.

Sending 10, 100-byte ICMP Echos to 192.168.0.254, timeout is 2 seconds:

Packet sent with the DF bit set

!!!!!!!!!!

Success rate is 100 percent (10/10), round-trip min/avg/max = 32/32/32 ms

TX-OPT-RTR#ping

Protocol [ip]:

Target IP address: 192.168.0.254

Repeat count [5]: 10

Datagram size [100]: 200

Timeout in seconds [2]:

Extended commands [n]: y

Source address or interface:

Type of service [0]:

Set DF bit in IP header? [no]: y

Validate reply data? [no]:

Data pattern [0xABCD]:

Loose, Strict, Record, Timestamp, Verbose[none]:

Sweep range of sizes [n]:

Type escape sequence to abort.

Sending 10, 200-byte ICMP Echos to 192.168.0.254, timeout is 2 seconds:

Packet sent with the DF bit set

!!!!!!!!!!

Success rate is 100 percent (10/10), round-trip min/avg/max = 32/32/36 ms

TX-OPT-RTR#ping

Protocol [ip]:

Target IP address: 192.168.0.254

Repeat count [5]: 10

Datagram size [100]: 300

Timeout in seconds [2]:

Extended commands [n]: y

Source address or interface:

Type of service [0]:

Set DF bit in IP header? [no]: y

Validate reply data? [no]:

Data pattern [0xABCD]:

Loose, Strict, Record, Timestamp, Verbose[none]:

Sweep range of sizes [n]:

Type escape sequence to abort.

Sending 10, 300-byte ICMP Echos to 192.168.0.254, timeout is 2 seconds:

Packet sent with the DF bit set

!!!!!.!.!.

Success rate is 70 percent (7/10), round-trip min/avg/max = 32/33/36 ms

william.caban · ‎09-26-2007

Is this from the PE or a CE?

If it is from the PE then you should look for the problem outside MPLS because you are executing the ping within the ipv4 routes and not from the vpn4 routes. (One possibility is an ipv4 MTU problem affecting MPLS on a specific path)

You may try the same sourcing it from the vrf "ping ip vrf"

-W

djimmy1979 · ‎09-26-2007

Ok I really start losing packets around 275. This was pinging from the Core. 275 seems awfully low almost makes me thing it's something else causing this than MTU.

core03-atl#$1CAN-Canoate ip 192.168.0.254 size 250 df-bit repeat 10

Type escape sequence to abort.

Sending 10, 250-byte ICMP Echos to 192.168.0.254, timeout is 2 seconds:

Packet sent with the DF bit set

!!!!!!!!!!

Success rate is 100 percent (10/10), round-trip min/avg/max = 1/2/4 ms

core03-atl#$1CAN-Canoate ip 192.168.0.254 size 251 df-bit repeat 10

Type escape sequence to abort.

Sending 10, 251-byte ICMP Echos to 192.168.0.254, timeout is 2 seconds:

Packet sent with the DF bit set

!!.!!!!!!!

Success rate is 90 percent (9/10), round-trip min/avg/max = 1/1/4 ms

william.caban · ‎09-26-2007

I just notice you mention you are seeing this problem on your ME part. Make sure that in your MetroEthernet, whatever vlan you are using, have an mtu > 1500 defined and take into consideration any additional stacked vlan you may have under it.

For Example:

!

vlan

...

mtu 1508

...

!

swaroop.potdar · ‎09-26-2007

To try to get the supported MTU size across your MPLS/Metro cloud you can try the DF bit pings as discussed here.

But if you are seeing intermittent losses it does not mean that you have exceeded the MTU limit across the core. When you exceed the MTU limit with the DF bit set you should be all the packets being lost. So very much you can start from datagram size of 1400 and keep incrementing it to 1500 and beyond. And check the point when you have all the packets being lost.

That would be your mtu limit. The current ping packet loss you are experiencing could be because of icmp rate limit by your SP in the core.

So to conclude you can verify the mtu first, and then if can be concluded what exactly is the problem.

HTH-Cheers,

Swaroop

djimmy1979 · ‎09-26-2007

I appreciate all the help guys. Once I get to the bottom I will post up the resolution.