cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
7790
Views
10
Helpful
7
Replies

OTV long convergence time

p.hruby
Level 1
Level 1

Hello,

i have OTV configured on two sites with two edge devices on each site.

During failover test I reloaded one edge device at one site. During the reload when I had entered "show otv vlan" on the second edge device in the same site I obtained the following output:

 

d1-OTV# sh otv vlan 


OTV Extended VLANs and Edge Device State Information (* - AED)

Legend: 
(NA) - Non AED, (VD) - Vlan Disabled, (OD) - Overlay Down
(DH) - Delete Holddown, (HW) - HW: State Down 
 (NFC) - Not Forward Capable 

VLAN   Auth. Edge Device                     Vlan State                 Overlay
----   -----------------------------------   ----------------------       ------
-
  24                                         inactive(240 s left)    Overlay1   
  25                                         inactive(240 s left)    Overlay1   

 

I had to wait nearly 4 minutes for the expected output:

 

d1-OTV# sh otv vlan 


OTV Extended VLANs and Edge Device State Information (* - AED)

Legend: 
(NA) - Non AED, (VD) - Vlan Disabled, (OD) - Overlay Down
(DH) - Delete Holddown, (HW) - HW: State Down 
 (NFC) - Not Forward Capable 

VLAN   Auth. Edge Device                     Vlan State                 Overlay
----   -----------------------------------   ----------------------       ------
-
  24*  d1-OTV              active                  Overlay1   
  25*  d1-OTV              active                  Overlay1  
 

 

Why does it takes such a long time? What is used this long hold time for? Could I tune it somehow? Four minutes outage is not acceptable for me.

 

Before the reload the output looked like this:

 


d1-OTV# sh otv vlan


OTV Extended VLANs and Edge Device State Information (* - AED)

Legend: 
(NA) - Non AED, (VD) - Vlan Disabled, (OD) - Overlay Down
(DH) - Delete Holddown, (HW) - HW: State Down 
 (NFC) - Not Forward Capable 

VLAN   Auth. Edge Device                     Vlan State                 Overlay
----   -----------------------------------   ----------------------       ------
-
  24   d2-OTV              inactive(NA)            Overlay1   
  25*  d1-OTV              active                  Overlay1 

 

 

Petr

1 Accepted Solution

Accepted Solutions

The 240 second timer is a standard that is being used to ensure we have all the internal component states correctly converged for the OTV features. We cannot bypass or modify that time currently but its possible that in future there may be a knob available to tune the timer.

For your case, the migration of the vlan from L2 trunk to the OTV will have to be subjected to the outage of 4 minutes.

I notice there are other queries in the thread which I would like to address

1. The 240 second timer is not applicable when there is a failover between AEDs in the local site. The convergence time maybe sub second in several different scenarios. If you see anything different then it would be good to open TAC cases and pursue it

2. When a failed AED comes back online it will start the countdown from 240 sec to 0 to take over the AED role for odd or even vlan. During this period the other local AED at the site is still forwarding the traffic for both odd and even vlans.

Let me know if there are questions.

-Raj

View solution in original post

7 Replies 7

Adam A
Level 1
Level 1

I have noticed the same thing and was wondering if anyone has a solution or work around to speed up the convergence time. The data centers have a 10ms rtt between them, it seems quite long for them to wait 240 seconds before forwarding traffic.

I have the same problem during an exercise to move my DCI (Data Centre Interconnect) from a trunk (with Spanning-Tree) to an OTV architecture.  I have links in place for the OTV architecture, and the OTV is already carrying some VLANs.  I also have links in place to carry the old trunked/STP VLANs.

My procedure is to move one VLAN at a time .. I remove it from the old trunk, and then extend it on the OTV.  But it takes four minutes (as described in the original porting) for the OTV to start forwarding.  Four minutes is not acceptable, and I don't have the opportunity to take the Data Centres down to do the change.

(To be totally truthful, the OTV links are not totally independent - they will be tunneled through a VLAN on the old trunk links until the migration is complete.)

Kevin Dorrell

Luxembourg

 

 

 

 

The 240 second timer is a standard that is being used to ensure we have all the internal component states correctly converged for the OTV features. We cannot bypass or modify that time currently but its possible that in future there may be a knob available to tune the timer.

For your case, the migration of the vlan from L2 trunk to the OTV will have to be subjected to the outage of 4 minutes.

I notice there are other queries in the thread which I would like to address

1. The 240 second timer is not applicable when there is a failover between AEDs in the local site. The convergence time maybe sub second in several different scenarios. If you see anything different then it would be good to open TAC cases and pursue it

2. When a failed AED comes back online it will start the countdown from 240 sec to 0 to take over the AED role for odd or even vlan. During this period the other local AED at the site is still forwarding the traffic for both odd and even vlans.

Let me know if there are questions.

-Raj

Thank you Raj, that allows me to think a bit more clearly about the problem, and also allows me to justify the 4 minute downtime to my boss ;-)

Kevin

Hi Raj,

 

The times for convergence have been reduced to less than 240 seconds now?

ngtransge
Level 1
Level 1

Hello,

 

Do you find solution to your problem. And continue testing convergence times ?

I have exact same problem. But with longer failure times. In my case when reloaded AED device boots and becomes active, it takes about 5-10 min till connection between data center is stabilized. during 5-10 min I observed excessive ping losts. 

 

For failure detection, you can try to enable OTV Fast Convergence feature, it allows about 5 sec failure detection.

 

Thanks in advance,

San

 

 

ranjit123
Level 3
Level 3

Dear All,

We also have the same issue the failover time is nearly 2 mins is there any workaround for the same to decrease the timers..

Regards,

Ranjit

 

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: