cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
9674
Views
30
Helpful
14
Replies

DMVPN Issues

Patrick McHenry
Level 3
Level 3

We have been having DMVPN issues since we started implementing it. The build was originally set up with EIGRP. We were having alot of problems - missing routes, neighbors going up and down and we thought it might be easier to change all the remote routers and the headends to ospf - just like the rest of the network. We thought that this would simplify our issues and enable us to narrow it down. Come to find out, an upgarde of the headends was needed because of a bug. The DMVPN is not working perfectly but, it is better. My question is this: We are planning on rolling out 250, 881s to remote workers. With OSPF, do think this might become a problem? At this point we are including them in the same area as the rest of our netwok, 0. Might it be better for us to create a new area for these remote workers? Or, maybe go back to EIGRP for the remote workers?

Any thoughts?

Thanks, Pat.

1 Accepted Solution

Accepted Solutions

I've used OSPF extensively in my previous job, but never in a DMVPN configuration. In my current job I use DMVPN a lot but only with EIGRP I decided to look around a bit and came across this which would be applicable to you:

http://www.cisco.com/en/US/docs/solutions/Enterprise/WAN_and_MAN/DMVPN_2_Phase2.html#wp38089

As far as Cisco recommends for OSPF, no more than 50 routers per area and no more than 3 areas per router (that would be your ABR in this case). Also the areas would have to be stubby or totally stubby. Not so sure about those router / area limits. What I would do is SNMP graph the cpu / memory utilization of the ABR and watch how it rises the more you put on it. Obviously a 6500 would handle more adjacencies than a 2900 or than a 1900 series router.

Another thing to potentially consider is the encrypted throughput capabilities of your hub router in question assuming you will be encrypting the traffic. Those limits are not published by Cisco and I had to contact my Cisco SE to get that information from him. Unfortunately I won't be back at work for a few days and don't have some of those limits in front of me and Im not even sure where I wrote them down to be honest. If I come across them I can email them to you or something. I believe 881's can do around 8 Mbps of encrypted throughput which is pretty decent for such a small router.

The official documentation says you need to configure all OSPF routers as a broadcast network type and also force the hub to be the only one allowed to become the DR for that broadcast segment.

All those types of issues are avoided in EIGRP and you don't have to worry about them. As an example, if you forget to set the OSPF priority on one of your spokes and your hub fails, that spoke will become the BDR which will cause a lot of problems for you and could be tricky to troubleshoot. For EIGRP, configure all spokes as stubs and advertise a summary route (either a default route or an aggregate of your corporate IP structure) on your hub's dmvpn interface and you're done.

I hope all that makes sense!

View solution in original post

14 Replies 14

Patrick McHenry
Level 3
Level 3

Just had a thought - slighty off topic but related to DMVPN. I  know that the MTU should match on both sides of the neighborship but, should the MTUs be set on the tunnel interfaces on both sides or the physicle interfaces on both sides? Or, both?

Thanks, Pat.

Giuseppe Larosa
Hall of Fame
Hall of Fame

Hello Pat,

I would put the remote workers in a separate DMVPN cloud.

You could use a totally stub area to minimize the database exchange, however all devices in the same area will see each other in the database.

For better stability you could divide the 250 routers in 4 subsets and you could associate each of them with a different DMVPN cloud / totally stub area.

The reasoning is that most of traffic patterns from/to remote workers should be to/from HQ so having them in a separate DMVPN should not be a problem.

About MTU: it has to match at tunnel interface level in order to build routing protocol neighborships. Matching also MTU on external physical interfaces is also desirable.

Hope to help

Giuseppe

I've pasted show output from the physical interface and the Tunnel interface to show the difference with the timers and also the Tunnel interface is Point to Multipoint, which I think is the way we want it for DMVPN but, the physical interface is Broadcast

Could these differences be the cause of this debug output? I see this throughout the day on the headends. That being said, all the remote routers are neighboring at present. But. it won't last which makes me think it is somekind of syncronization problem. 

Debug ip osppf events:

"EXSTART to DOWN, Neighbor Down: Too many

retransmissions EXSTART to DOWN, Neighbor Down: Too many
retransmissions"

Show Output:

MK-1001-VPN1#sh ip ospf int

GigabitEthernet0/0/0 is up, line protocol is up

  Internet Address 172.20.64.102/24, Area 0, Attached via Network Statement

  Process ID 1, Router ID 172.20.64.102, Network Type BROADCAST, Cost: 1

  Topology-MTID    Cost    Disabled    Shutdown      Topology Name

        0           1         no          no            Base

  Transmit Delay is 1 sec, State DROTHER, Priority 1

  Designated Router (ID) 172.20.64.103, Interface address 172.20.64.103

  Backup Designated router (ID) 1.1.1.12, Interface address 172.20.64.2

  Timer intervals configured, Hello 10, Dead 40, Wait 40, Retransmit 5

    oob-resync timeout 40

    Hello due in 00:00:01

  Supports Link-local Signaling (LLS)

  Cisco NSF helper support enabled

  IETF NSF helper support enabled

  Can be protected by per-prefix Loop-Free FastReroute

  Can be used for per-prefix Loop-Free FastReroute repair paths

  Index 2/2, flood queue length 0

  Next 0x0(0)/0x0(0)

  Last flood scan length is 0, maximum is 10

  Last flood scan time is 0 msec, maximum is 1 msec

  Neighbor Count is 3, Adjacent neighbor count is 2

    Adjacent with neighbor 1.1.1.12  (Backup Designated Router)

    Adjacent with neighbor 172.20.64.103  (Designated Router)

  Suppress hello for 0 neighbor(s)

Tunnel0 is up, line protocol is up

  Internet Address 172.20.68.1/22, Area 0, Attached via Network Statement

  Process ID 1, Router ID 172.20.64.102, Network Type POINT_TO_MULTIPOINT, Cost: 50

  Topology-MTID    Cost    Disabled    Shutdown      Topology Name

        0           50        no          no            Base

  Transmit Delay is 1 sec, State POINT_TO_MULTIPOINT

  Timer intervals configured, Hello 30, Dead 120, Wait 120, Retransmit 5

    oob-resync timeout 120

    Hello due in 00:00:04

Thanks, Pat

Hello Pat,

the different type of OSPF network should not be an issue as the OSPF network type is per L3 interface dependent and you can have different type of OSPF networks in the same area.

The OSPF network type regulates how OSPF activity is performed on a specific L3 interface on several aspects:

- the hello/dead interval timers

- the use or not use of multicast hello discovery ( depending on network type this can be disabled)

- the election of DR/BDR or the lack of this

With point-to-multipoint you use modified hello/dead intervals, you still use multicast based discovery -> no need for manual neighbor command, and there is no election of DR/BDR.

The MGRE tunnel interface is an indipendent L3 interface that can have its own OSPF network type.

If I remember correctly you just have 11-15 spoke routers and in a previous thread we had tried to analyze why few of them were not able to reach the FULL state and they stucked at EXSTART state.

The fact they can reach two way state is a proof that multicast based discovery works.

From configuration of tunnels and of physical interfaces there are no MTU mismatches that is the typical issue for stucking in EXSTART.

An issue at NHRP level is unlikely as each spoke registers with both hub / NHRP servers at regular time intervals (this was my guess that multicast communication was possible but unicast communication was not possible, but spokes have a static NHRP entry for the hub router and should register to the hub regularly in NHRP).

Hope to help

Giuseppe

Giuseppe, I agree, I don't think the problem is nhrp.

how come I see a difference in these two outputs in regard to the MTU?

MK-1001-VPN1#sh int tunn0
Tunnel0 is up, line protocol is up
  Hardware is Tunnel
  Internet address is 172.20.68.1/22
  MTU 17912 bytes, BW 2000 Kbit/sec, DLY 10000 usec,
     reliability 255/255, txload 5/255, rxload 13/255


MK-1001-VPN1#sh run int tunn0
Building configuration...

Current configuration : 521 bytes
!
interface Tunnel0
bandwidth 2000
ip address 172.20.68.1 255.255.252.0
no ip redirects
ip mtu 1400

Thanks, Pat.

Hello Pat,

the most specific command rules so for IPv4 MTU is 1400, the show interface shows the MTU of the tunnel interface but it does not apply to IPv4 packets

To change that value you should use the mtu command instead of ip mtu.

Hope to help

Giuseppe

Giuseppe,

Do you think this could be the source of the remote routers losing their neighborship? Because the debug suggest to me it is a synconization issue. They seem to work for a while then go down now. I should say that we did upgrade the two headend routers last week and things have improved but, some of the remote router will drop after a while - then come back.

Should I change the mtu on the physical and tunnel interfaces to mtu 1400 on both the headend routers and the remote routers?

Thanks, Pat.

Hello Pat,

>> Should I change the mtu on the physical and tunnel interfaces to mtu 1400 on both the headend routers and the remote routers?

No,you shouldn't. This would cause an impairment as the external interface has to be able to handle the whole packet including all the overhead introduced by DMVPN that includes GRE header ( 24 bytes)  + IPSEC ( depending on the transformation set in use if using ESP with HMAC for example).

So external interface should have an higher MTU standard 1500 bytes should be fine. About the tunnel interfaces the ip mtu cover all the traffic of interest as it applies to all IPv4 traffic.

One thing that could be attempted is to reduce from ip mtu 1400 to ip mtu 1300 on the tunnel interfaces to check if the behaviour improves.

Depending on the transformation set in use GRE + IPSEC can use more then 100 bytes, this is the reason to attempt to use a lower value. OSPF database packets try to use packets as big as the internal MTU.

You should have a dual DMVPN with two separate clouds so the suggestion is to make the change only in one hub/cloud including the spokes.

Hope to help

Giuseppe

So, you are saying to make one headend hub tunnel mtu 1300 and the other headend hub tunnel keep at 1400 then, make one tunnel interface on the spokes 1300(the one that faces the 1300 headend) and keep the other tunnel interface at 1400(the one that faces the 1400 headend)?

So one cloud will be set for 1300 mtu and one cloud will be set at 1400 mtu?

Thanks, Pat.

Hello Pat,

yes it should be a temporary scenario to see if the reduced MTU approach works without impacting both DMVPN clouds.

Hope to help

Giuseppe

richardbergen
Level 1
Level 1

Hey Patrick, I use multi-DMVPN tunnels without any issues and stumbled across this post by accident. Here are some points I would recommend:

- if you have single user remote worker sites, (as in not remote offices with multiple users) I would recommend using the anyconnect client instead of deploying a router. As long as you have the right featureset on your hub / vpn router it could be potentially free without having to buy 250 routers for all the remote users. cha ching.

- I did not change the physical MTU on my the tunnel interfaces and have had no issue. If you check your IP MTU on the interface, you will notice the router has already taken into account of the additional overhead (on newer routers, 881's do this automatically with newer-ish IOS ). You should not need to manually override the IP MTU unless you are doing further encaspluation that the router is not taking into account. Just make sure the IP MTU is the same on hub and spoke for OSPF to work. Depending on the IOS version you are running, the router may or may not automatically take the overhead into account and therefore be causing you issues. I have seen some 831's that would not do this.

- GRE (transport mode) + IPSec = 60 bytes overhead, which is why the interface below is automatically set to 1440 by the router.

http://www.cisco.com/en/US/tech/tk827/tk369/technologies_white_paper09186a00800d6979.shtml#t16

eg.

#sh ip int t20

Tunnel20 is up, line protocol is up

  Internet address is x.x.x.x/xx

  Broadcast address is 255.255.255.255

  Address determined by setup command

  MTU is 1440 bytes

- depending on what you are doing and how much you are encapsulating, you may have to do a little math yourself and figure out what the proper MTU is. (google IPSec overhead, GRE overhead, etc and add up the bytes).

- to check your IP MTU that you should be using, send pings with increasing sizes with the DF (dont' fragment) bit set until you reach a size that no longer responds. The last size that responded is your maximum IP MTU.

- for NHRP issues, check the hub to make sure the spoke has correctly registered. (show ip nhrp x.x.x.x)

- if it says that it has not registered, you may want to play with the ip nhrp registration timeout values on the spokes

- configure ip tcp adjust-mss command under your inbound interface (I think), it automatically will change the negotiated TCP size between the hosts communicating through the router eliminating potential fragmentation problems betwen hosts

- your best bet is to run debugs on both routers and try to capture the error in progress. This will take some time of troubleshooting (ie. debugging nhrp, or debugging ipsec, debugging eigrp / ospf, etc) trying to narrow down the issue and ruling out one technology at a time. Set up an access-list on the remote router's internet interface so only you can SSH into the device and run the debug one at a time, saving it to the buffer. Increase the buffer memory to a large enough size that can give you some data out of it after you notice a problem has occured.

- as for OSPF, I definitely would NOT put them in area 0. As somone else recommended, you can try to build a single area or split it into a few areas (what I would recommend). That many routers will probably cause a lot of SPF calculations and could potentially affect the smaller 881's (great little router by the way), depending on the stability of the spokes.

- EIGRP is way easier, you don't have to worry about your ospf interface types (ie. p2m, p2m non-broadcast, p2m broadcast, p2p, broadcast, non-broadcast etc etc you get my idea). That can get confusing pretty quick, if you are not really confortable with OSPF then I would just stick with EIGRP. Only certain OSPF interface types are compatible with each other and will require some research on your behalf to determine the best type of interface that is right for you.

- whatever protocol you choose, depending on the reliability of the remote clients, your (default) interface routing protoocol hello timers might be set too low. I would recommend setting 30 second hello timers. More stable neighbors, but slower convergence times. I prefer stability. I've seen issues like this happen when people are on wireless (fixed wireless or cellular) links or even just crappy adsl / cable connections that have (infrequent but bad) packetloss. This will cause a lot of neighbor flapping to happen and therefore cause routing changes. If using EIGRP, make sure the remote routers are configured to stubs and you are advertising a summary EIGRP route to the spokes. This prevents them from being queried and also the possibility of being used as transport routers.

- you can also set up a continuous ping to their public IPs and measure for packetloss to see how "clean" their connections are and why some of them are dropping. (ping plotter is a good tool for this)

- OSPF scales well, but takes A LOT of planning and thinking, especially if you have a large network. EIGRP can scale well too if you properly implement summarization / stub routers. Either case requires very good IP subnetting / planning

just my 2 cents! or in this case probably about 25 cents or something.

Thanks for the schooling. We are using these 881s for a single user. The advantage of the 881s is the remote users are able to use a 7965 phone instead of a softphone. That seems to be it. I am concerned about the single area issue and I think it will bite us as we role more out. Haven't decided if we will go with EIGRP or another OSPF area.

You mentioned before to be careful with the network type for OSPF. We are using point-to-multipoint between the headends and remote routers. That is all we need to do, correct? Even if we decide to create another area.

Thanks, Pat.

I've used OSPF extensively in my previous job, but never in a DMVPN configuration. In my current job I use DMVPN a lot but only with EIGRP I decided to look around a bit and came across this which would be applicable to you:

http://www.cisco.com/en/US/docs/solutions/Enterprise/WAN_and_MAN/DMVPN_2_Phase2.html#wp38089

As far as Cisco recommends for OSPF, no more than 50 routers per area and no more than 3 areas per router (that would be your ABR in this case). Also the areas would have to be stubby or totally stubby. Not so sure about those router / area limits. What I would do is SNMP graph the cpu / memory utilization of the ABR and watch how it rises the more you put on it. Obviously a 6500 would handle more adjacencies than a 2900 or than a 1900 series router.

Another thing to potentially consider is the encrypted throughput capabilities of your hub router in question assuming you will be encrypting the traffic. Those limits are not published by Cisco and I had to contact my Cisco SE to get that information from him. Unfortunately I won't be back at work for a few days and don't have some of those limits in front of me and Im not even sure where I wrote them down to be honest. If I come across them I can email them to you or something. I believe 881's can do around 8 Mbps of encrypted throughput which is pretty decent for such a small router.

The official documentation says you need to configure all OSPF routers as a broadcast network type and also force the hub to be the only one allowed to become the DR for that broadcast segment.

All those types of issues are avoided in EIGRP and you don't have to worry about them. As an example, if you forget to set the OSPF priority on one of your spokes and your hub fails, that spoke will become the BDR which will cause a lot of problems for you and could be tricky to troubleshoot. For EIGRP, configure all spokes as stubs and advertise a summary route (either a default route or an aggregate of your corporate IP structure) on your hub's dmvpn interface and you're done.

I hope all that makes sense!

You might want to look at a recent Cisco webinar to look at some new math and different approaches - "A Closer Look: Comparing Benefits of EIGRP and OSPF"  - https://learningnetwork.cisco.com/thread/46365

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Innovations in Cisco Full Stack Observability - A new webinar from Cisco