Guys, Could anyone steer me on this point.
I have done in the past some testing of apps etc etc, and the one thing to note is the TCP retransmission process. They Key being the max-data-retransmissions and the RTO timer being the key to setting the lifecycle of the TCP session.
I have attached a document (still verifying some of the stuff in it) that I wrote to look into the process and options
so, the MS and sunOS systems generally set the lower bound to the RTO of 500ms as if this was not done, 5 TCP retransmissions would happen very quickly and a network convergence event would break the sessions with standard IGP timers.
Please correct me if I am wrong so far.
But even with the lower bounded RTO of 500ms and max-data-retransmissions generally set to 5 in the OSs TCP stack, this could yield retransmission times as follows :-
on a local LAN
Original Packet sent. - No ack received - waits for lower bounded RTO of 500ms before retran
Retran 1 (RTO now 1000) - No ack received within 1000ms
Retran 2 (RTO now 2000) - No ack received within 2000ms
Retran 3 (RTO now 4000) - No ack received within 4000ms
Retran 4 (RTO now 8000) - No ack received within 8000ms
Retran 5 (RTO now 16000) - dont care here if it is received or not, but if not received within RTO TCP RST sent
Retran 5 is sent at 15.5 seconds (including original RTO of 500ms and adding up all the retrans RTO cumulative timers.
So depending on where the hello cycle is, eigrp could take between 10.00000000 and 14.999999999 seconds to down a neighbor and in that time traffic from a tcp session, and all of its retransmissions could black hole. If the tcp retran 5 is on 16 seconds, it very close!!!!
OSPF with 10 and 40 is just out of the loop at this stage.
So the question is : what should we be doing to keep convergence and the TCP layer stable? Should we be reducing timers?
Also, please note that the retrans stats above are clinical results and there would be some randomisation to these figures.
extract from the doc shows this:
Packet 1 Sent - Standard RT0 timer set.
P1 55 20:48:21.1039 192.168.69.100 192.168.69.2 TELNET Telnet Data ...
1st retransmission sent 554.3 ms later
R1 56 20:48:21.6582 192.168.69.100 192.168.69.2 TELNET Telnet Data ...
2nd retransmission sent 1s 207.0 ms later from 1st retransmission and 1s 646ms from orig packet
R2 57 20:48:22.8652 192.168.69.100 192.168.69.2 TELNET Telnet Data ...
3rd retransmission sent 2s 429.6 ms later from 2nd retransmission and 4s 190.9 ms from orig packet
R3 58 20:48:25.2948 192.168.69.100 192.168.69.2 TELNET Telnet Data ...
4th retransmission sent 4s 826.2 ms later from 3rd retransmission and 9s 19.1 ms from orig packet
R4 59 20:48:30.1230 192.168.69.100 192.168.69.2 TELNET Telnet Data ...
5th retransmission sent 9s 656.3 ms later from 4th retransmission and 18s 675.4 ms from orig packet
R5 60 20:48:39.7793 192.168.69.100 192.168.69.2 TELNET Telnet Data ...
In theory, what should happen now is that an RTO will get set of double the last RTO which could be up to 18 seconds.