Solved: delayed hellos

paul amaral · ‎02-23-2012

Hoping someone can help me understand this issue im having as i can't seem to figure out the issue.

I have two 6500 routers connected together, we will call them router A and router B. There are four links doing EIGRP unequal cost load balancing between router's A and B. Three links terminate on serial interface and one link on fastethernet interface, note that these links are wireless links that terminate on the serial/ethernet interfaces of the 6500s.

The link that im having an issue with is the fastethernet link which on occasion seems to be getting delayed hellos and sometimes it misses 3 hellos and the link goes down. Note that I changed the bandwidth values on the fastethernet link to 44210 (wireless link is only this fast) from the default of 100000 Kb but the delay is at default of 100 usecs.

On router A the fastethernet link is showing very high SRTT but on router B the SRTT is normal. I know this would be an issue if there was something wrong with the link but you would expect to see a high SRTT on both sides. Also when i ping from router A to router B i see no issue with timeouts or high latency. So im not really sure how router A is seeing such a high hello packet SRTT when the link is fine.

Router A:

IP-EIGRP neighbors for process 1

H Address Interface Hold Uptime SRTT RTO Q Seq

(sec) (ms) Cnt Num

3 xxx.2 Fa1/20 12 00:43:19 572 3432 0 27246571

1 xxx.134 Se8/1/0 13 1d05h 53 318 0 27246574

2 xxx.86 Se8/1/1 14 6d02h 53 318 0 27246572

0 xxx.226 Se9/1/0 12 4w3d 55 330 0 27246573

Router B:

IP-EIGRP neighbors for process 1

H Address Interface Hold Uptime SRTT RTO Q Seq

(sec) (ms) Cnt Num

3 xxx.1 Fa1/5 13 00:45:23 10 200 0 28096706

1 xxx.133 Se8/1/0 11 1d05h 10 200 0 28096709

2 xxx.85 Se8/1/1 11 6d02h 29 200 0 28096708

0 xxx.225 Se8/0/0 13 4w3d 53 318 0 28096707

Pinging results in no issues, sometimes i get the max 300+ ms

Type escape sequence to abort.

Sending 1000, 1500-byte ICMP Echos to xxx.xxx.xxx.2, timeout is 2 seconds:

Success rate is 100 percent (1000/1000), round-trip min/avg/max = 4/9/152 ms

When i debug this on router A i do see that hellos are sometimes not making it on time back from router B, however im not sure how this if when i ping the link is fine.

Feb 22 17:27:29: EIGRP: Received HELLO on FastEthernet1/20 nbr .2

Feb 22 17:27:29: EIGRP: Sending HELLO on FastEthernet1/20

Feb 22 17:27:33: EIGRP: Sending HELLO on FastEthernet1/20

Feb 22 17:27:33: EIGRP: Received HELLO on FastEthernet1/20 nbr .2

Feb 22 17:27:37: EIGRP: Received HELLO on FastEthernet1/20 nbr .2

Feb 22 17:27:38: EIGRP: Sending HELLO on FastEthernet1/20

Feb 22 17:27:42: EIGRP: Received HELLO on FastEthernet1/20 nbr .2

Feb 22 17:27:43: EIGRP: Sending HELLO on FastEthernet1/20

i'm seeing the same on router B

Feb 22 17:35:15: EIGRP: Sending HELLO on FastEthernet1/5

Feb 22 17:35:20: EIGRP: Received HELLO on FastEthernet1/5 nbr .1

Feb 22 17:35:20: EIGRP: Sending HELLO on FastEthernet1/5

Feb 22 17:35:25: EIGRP: Sending HELLO on FastEthernet1/5

On router A that has the high SRTT i will see the occasional

"%DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 209.213.71.2 (FastEthernet1/20) is down: Interface Goodbye received"

on router B it show dead timer expired.

So how can the SRTT be high on one end of the link but not the other? How can the hello packets be delayed if pinging reveals no issue?

I know if i increase the EIGRP hello timer it will probably fix the issue but with the defaul timer of 5 secs i shouldn't see an issue when the ping test reveals no problems. I'm not sure what else i can do to debug/trace this issue, also if i run OSPF over the same interfaces i do see the same delayed ospf hello issues although not as much since the hellos are send every 10 sec and the dead timer 40.

Heres more info

router A

IP-EIGRP interfaces for process 1

Xmit Queue Mean Pacing Time Multicast Pending

Interface Peers Un/Reliable SRTT Un/Reliable Flow Timer Routes

Fa1/20 1 0/0 572 0/1 2856 0

Next xmit serial <none>

Un/reliable mcasts: 0/1 Un/reliable ucasts: 2/2

Mcast exceptions: 0 CR packets: 0 ACKs suppressed: 1

Retransmissions sent: 0 Out-of-sequence rcvd: 0

Authentication mode is not set

Router B

IP-EIGRP interfaces for process 777

Xmit Queue Mean Pacing Time Multicast Pending

Interface Peers Un/Reliable SRTT Un/Reliable Flow Timer Routes

IP-EIGRP interfaces for process 1

Xmit Queue Mean Pacing Time Multicast Pending

Interface Peers Un/Reliable SRTT Un/Reliable Flow Timer Routes

Fa1/5 1 0/0 10 0/1 50 0

Next xmit serial <none>

Un/reliable mcasts: 0/1 Un/reliable ucasts: 1/4

Mcast exceptions: 1 CR packets: 1 ACKs suppressed: 1

Retransmissions sent: 1 Out-of-sequence rcvd: 0

Authentication mode is not set

TIA, Paul

Peter Paluch · ‎02-23-2012

Paul,

If unicast pings appear to be delivered correctly, could you try to configure both your EIGRP peers A and B on the Fa1/20 and Fa1/5 interfaces statically using the neighbor command to force them to use unicast EIGRP communication? It is a blind shot and a workaround for some yet unknown problem, yet it would at least show us if we're headed in the correct direction.

There may be issues with delivering multicast/broadcast traffic over a wireless link.

Best regards,

Peter

View solution in original post

Peter Paluch · ‎02-23-2012

Paul,

If unicast pings appear to be delivered correctly, could you try to configure both your EIGRP peers A and B on the Fa1/20 and Fa1/5 interfaces statically using the neighbor command to force them to use unicast EIGRP communication? It is a blind shot and a workaround for some yet unknown problem, yet it would at least show us if we're headed in the correct direction.

There may be issues with delivering multicast/broadcast traffic over a wireless link.

Best regards,

Peter

paul amaral · ‎02-27-2012

Peter,

thanks very much for the reply. Last week I read your responce and tried what you suggested. Thanks for reminding me that the neighbor statement changes hellos to unicast. It probably was going to take me a little longer before I remembered that on my own.

Now that I have manually configured the neighbors with the eigrp neighbor statement, EIGRP between router A and B have been up for over 2 days.

IP-EIGRP neighbors for process 1

H Address Interface Hold Uptime SRTT RTO Q Seq

(sec) (ms) Cnt Num

3 x.x.x.2 Fa1/20 11 2d19h 3 200 0 27246779

as a result the SRTT/RTO also have decreased to normal.

The weird thing is the other links are still fine using multicasts, the only difference between the wireless links between router A and B is that the link which I just switched to unicast terminates on a fastether and the other 2 links on serial interfaces. The fastether link does have a serial to ethernet converter on either side as were the other links are all serial with no converters. The serial to ethernet converter is a passive device and there are no options and I can't see why this would be causing multicast issues.

Would anyone have a theory as to why im having multicast issues on that link.

thanks Paul

paul amaral · ‎02-27-2012

I was just able to look at the wireless EQ which i did not setup and im convienced its causing the issue.

Here are some default settings:

Aggregation: A part of the 802.11n standard (or draft-standard). It allows sending multiple frames per single access to the medium by combining frames together into one larger frame. It creates the larger frame by combining smaller frames with the same physical source and destination end points and traffic class (i.e. QoS) into one large frame with a common MAC header.

Frames – determines the number of frames combined on the new larger frame.
Bytes – determines the size (in Bytes) of the larger frame

Default: 32 frame of 50000 bytes

Although i dont know if there is a size different between unicast and multicast hellos i think its aggregating the multicast packets before sending them out. There is a Multicast packet pass-through functionality that is checked but doesnt seem to be fixing the issue. I will email the manufacture and post an update.

FYI using ubiquiti wireless eq with air os 5

Peter Paluch · ‎02-27-2012

Hello Paul,

Hmmm... I do not think that the frame aggregation is causing these problems - or at least, I do not see how it could cause them. On the other end of the wireless link, the frames get de-aggregated again. I have only a very brief understanding of wireless technologies in depth but I have a feeling that sending broadcast/multicast frames needs to be performed in a slightly different way. Nevertheless, the idea of contacting the vendor of the wireless equipment is very good and I am very interested in what the vendor has to say here.

By the way, how heavily is the lossy wireless link utilized?

Best regards,

Peter

paul amaral · ‎02-27-2012

Peter,

we heard back from our wireless vendor, Ubiquiti, and they confirm there is a problem with multicast packets with the current firmware and plan on fixing it with a new release at the end of March. They didnt go into detail on what the issue was but after testing this and taking your suggestion there is no doubt an issue exists with multicast. My theory is that its trying to aggregate alike packets up to 32 packets or 50000 bytes and at times its taking longer than the 5 second EIGRP hello timer when it reaches the other side thus why sometimes I see double responses come in from the debug output above and on rare occassions reaching the dead timer. Although the offer a muticast option in they firmware i dont think its working as intended. I know a lot of wireless EQ offer aggregation and i have other vendor EQ with that setting working with no issues but it can be tricky for time sensitive traffic and hopefully someone finds this info helpful. The utilization on the link is 10% so that is not the issue.

thanks again, Paul

Peter Paluch · ‎02-27-2012

Hello Paul,

Wonderful, thanks so much for letting us know! Still, I can't imagine an access point to wait for 32 packets in course of more than 5 seconds to coalesce them - waiting for them should not take more than a split of second, or the waiting is far too long. But this is just a hypothesis on my part.

I hope your vendor to correct the issue - but as there is no need to actually revert back to multicast EIGRP operation on a point-to-point link, there's no haste.

Best regards,

Peter