every 3 weeks or so, we have an issue with remote sites connecting to our head office. it occurred again this morning, which led to me capturing logs from WAN router melrtrw001, core router melcore001 as well as non-cisco accelerator sitting between the two routers
Jul 24 23:22:12.041: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.10.83 on Vlan254 from FULL to DOWN, Neighbor Down: Too many retransmissions
Jul 24 23:23:12.048: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.10.83 on Vlan254 from DOWN to DOWN, Neighbor Down: Ignore timer expired
Jul 24 23:24:46.851: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.10.83 on Vlan254 from EXSTART to DOWN, Neighbor Down: Dead timer expired
Jul 24 23:27:50.297: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.10.83 on Vlan254 from LOADING to FULL, Loading Done
Jul 25 09:22:50 AEST: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.10.84 on GigabitEthernet0/0 from 2WAY to DOWN, Neighbor Down: Dead timer expired
Jul 25 09:23:24 AEST: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.252.2 on GigabitEthernet0/0 from 2WAY to DOWN, Neighbor Down: Dead timer expired
Jul 25 09:25:23 AEST: %OSPF-5-ADJCHG: Process 1, Nbr 10.3.254.5 on GigabitEthernet0/0 from 2WAY to DOWN, Neighbor Down: Dead timer expired
Jul 25 09:26:01 AEST: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.252.2 on GigabitEthernet0/0 from 2WAY to DOWN, Neighbor Down: Dead timer expired
Jul 25 09:26:39 AEST: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.10.84 on GigabitEthernet0/0 from 2WAY to DOWN, Neighbor Down: Dead timer expired
Jul 25 09:27:49 AEST: %OSPF-5-ADJCHG: Process 1, Nbr 126.96.36.199 on GigabitEthernet0/0 from LOADING to FULL, Loading Done
Jul 25 09:27:50 AEST: %OSPF-5-ADJCHG: Process 1, Nbr 188.8.131.52 on GigabitEthernet0/0 from LOADING to FULL, Loading Done
OSPF flap due to too many retransmission happens when DBD packets (OSPF updates) from one router are not able to reach to the other router. Most f the time it happens due to MTU issue and low MTU on any L2device in the path can also cause the problem. To check this, you can ping to other side ip address with size 1500 and DF bit set. But if that would be the case then it will keep on flapping since due to low MTU in the path DBD packet will never reach to the other side.
In your case it is flapping in every 3 weeks or so. This could also be due to some layer1 issue in the path which is causing packet drop and BDB packets are not reaching to the other side. Check interface statistics for CRC, input errors on both sides . During issue run ping for 1000 packets with size 1500 and check if there is any packet drop.
Your analysis is good but I am afraid it does not fully apply in this case: if you read Rama's post carefully you will notice that his OSPF routers complain about the retransmissions in the Full state and revert from the Full state into the Down state. The DBD packets and MTU issues would have already manifested themselves in the ExStart and Exchange states, and the routers would never make it to the Full state.
In fact, all OSPF reliable packets are retransmitted if not acknowledged - DBDs, LSRs, and LSUs. The issue with too many retransmissions can occur in any state that allows reliable packets to be exchanged between OSPF neighbors, i.e. ExStart, Exchange, Loading and Full
However, as this issue occurs infrequently and over WAN links, I would indeed suggest keeping an eye on these circuits, both their utilization and reliability. Perhaps the service provider can provide some kind of testing of the circuits to see if they are operating within negotiated parameters.
I do not suppose that there is STP involved over those WAN circuits - but if there is I would also have a look whether there can be any topology change going around the same time when the OSPF adjacencies are torn down. STP topology changes can cause disrupted connectivity for up to 50 seconds. That could easily bring OSPF neighors down.
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
Where exactly does the WAN accelerator reside in the topology?
What I'm wondering with whether that accelerator is delaying or causing OSPF packets to be lost.
Question We run asr9001 with XR 6.1.3, and we have a very long delay to
login w/ SSH 1 or 2 to the device compare to IOS device. After
investigation, the there is 1s delay between the client KEXDH_INIT and
the server (XR) KEXDH_REPLY. After debug ssh serv...
Introduction The purpose of this document is to demonstrate the Open
Shortest Path First (OSPF) behavior when the V-bit (Virtual-link bit) is
present in a non-backbone area. The V-bit is signaled in Type-1 LSA only
if the router is the endpoint of one or ...
Hi, I am seeing quite a few issues with patch install and wanted to
share my experience and workaround to this. Login to admin via CLI, then
access root with the “shell” command Issue “df –h” and you’ll probably
see the following directory full or nearly ...