Re: TCPs Role in IGP Convergence

kfarrington · ‎05-23-2007

Guys, Could anyone steer me on this point.

I have done in the past some testing of apps etc etc, and the one thing to note is the TCP retransmission process. They Key being the max-data-retransmissions and the RTO timer being the key to setting the lifecycle of the TCP session.

I have attached a document (still verifying some of the stuff in it) that I wrote to look into the process and options

so, the MS and sunOS systems generally set the lower bound to the RTO of 500ms as if this was not done, 5 TCP retransmissions would happen very quickly and a network convergence event would break the sessions with standard IGP timers.

Please correct me if I am wrong so far.

But even with the lower bounded RTO of 500ms and max-data-retransmissions generally set to 5 in the OSs TCP stack, this could yield retransmission times as follows :-

on a local LAN

Original Packet sent. - No ack received - waits for lower bounded RTO of 500ms before retran

Retran 1 (RTO now 1000) - No ack received within 1000ms

Retran 2 (RTO now 2000) - No ack received within 2000ms

Retran 3 (RTO now 4000) - No ack received within 4000ms

Retran 4 (RTO now 8000) - No ack received within 8000ms

Retran 5 (RTO now 16000) - dont care here if it is received or not, but if not received within RTO TCP RST sent

Retran 5 is sent at 15.5 seconds (including original RTO of 500ms and adding up all the retrans RTO cumulative timers.

So depending on where the hello cycle is, eigrp could take between 10.00000000 and 14.999999999 seconds to down a neighbor and in that time traffic from a tcp session, and all of its retransmissions could black hole. If the tcp retran 5 is on 16 seconds, it very close!!!!

OSPF with 10 and 40 is just out of the loop at this stage.

So the question is : what should we be doing to keep convergence and the TCP layer stable? Should we be reducing timers?

Also, please note that the retrans stats above are clinical results and there would be some randomisation to these figures.

extract from the doc shows this:

Packet 1 Sent - Standard RT0 timer set.

P1 55 20:48:21.1039 192.168.69.100 192.168.69.2 TELNET Telnet Data ...

1st retransmission sent 554.3 ms later

R1 56 20:48:21.6582 192.168.69.100 192.168.69.2 TELNET Telnet Data ...

2nd retransmission sent 1s 207.0 ms later from 1st retransmission and 1s 646ms from orig packet

R2 57 20:48:22.8652 192.168.69.100 192.168.69.2 TELNET Telnet Data ...

3rd retransmission sent 2s 429.6 ms later from 2nd retransmission and 4s 190.9 ms from orig packet

R3 58 20:48:25.2948 192.168.69.100 192.168.69.2 TELNET Telnet Data ...

4th retransmission sent 4s 826.2 ms later from 3rd retransmission and 9s 19.1 ms from orig packet

R4 59 20:48:30.1230 192.168.69.100 192.168.69.2 TELNET Telnet Data ...

5th retransmission sent 9s 656.3 ms later from 4th retransmission and 18s 675.4 ms from orig packet

R5 60 20:48:39.7793 192.168.69.100 192.168.69.2 TELNET Telnet Data ...

In theory, what should happen now is that an RTO will get set of double the last RTO which could be up to 18 seconds.

Thx

Ken

mheusinger · ‎05-24-2007

Hi Ken,

generally network convergence is composed of many components:

1) failure detection

2) propagation time of failure information

3) path recalculation timers

All three of them can be the major component.

As for 1): with static routing in place the time could even be infinity, if no link down event occurs. If "active testing" is involved, i.e. some keepalive mechanism, then the interval and dead timers determine the contribution to convergence. Today IGPs like OSPF or also spanning-tree (rstp) can be tuned to have hello/dead timers in the few second range (sometimes even subsecond).

One should be careful though, as trying to convert a Toyota station waggon into a top fuel dragster by just fueling it with Nitro might break the thing ... ;-)

In other words: there are limitations with respect to CPU load, memory and bandwidth, which need to be taken into account.

As for 2): this depends on IGP or EGP update timers. Generally link-state protocolls are faster, BGP might be the slowest. There are also throtteling timers involved (to save the Toyota ;-)

All the timers can usually be adjusted though to speed up the process.

As for 3): There are also throtteling timers for path recalculation and the available CPU ressources are also of some importance. In addition the complexity of the recalculation is important. F.e. RIP will just insert an Update into the IP routing table whereas OSPF recalculates the topology in some cases.

Final statement: the default timers are usually choosen conservatively, there is room for improovement in most cases. To lower convergence time in a complete network you need detailed knowledge on a protocoll and device feature level as well as on network specifics (like average CPU load in a router).

Regards, Martin

For further reading:

Caveats in Testing Routing Protocol Convergence

http://www.cisco.com/web/about/ac123/ac147/archived_issues/ipj_8-4/testing_routing.html

High Availability White Papers

http://www.cisco.com/en/US/products/ps6550/prod_white_papers_list.html

kfarrington · ‎05-24-2007

Hi Martin, Good to speak to you again.

OK, So the network must converge in a way and time where TCP sessions with the max-data-retransmissions and lower-bounded RTO must be taken into account?

Ie, with the two variables, it does give you a "finite" time for the maximum time a TCP session can live.

This is an important question to understand.

So, if your TCP session sends its 5th retransmission on 17s after the original packet was sent out, the parameter you have to set within your network is an absolute convergence time of below 17 seconds?

Would you agree with that statment?

Many thx, as always to all,

Ken

mheusinger · ‎05-24-2007

Hi Ken,

afaik yes.

The point in todays networks is however not only TCP. VoIP and Video are more crucial as outages in the range of a second will benoticeable (though mostly tolerable), whereas 15 second breaks in a telephone conversation usually are unacceptable.

Thus the goals today are more 1 sec. convergence time than 10 sec.

Or generally speaking: your users and applications will define acceptable or unacceptable convergence times. TCP session timeouts are only one piece of the puzzle.

Regards, Martin

kfarrington · ‎05-24-2007

This is good stuff to chat about mate

Be interesting to see others views are. Especially Russ White's on this.

Especially as you say, in VoIP networks, so, say you had a dual path using EIGRP, and one neighbor becomes unreachable, between 10 and 15 seconds, traffic "could" be black holing. TCP may/may not survive, VoIP, well, unacceptable as you say.

So looking on from this, I think I need to do much work in tuning the network for a VoIP rollout.

Thoughts ?

Regards,

Ken

mheusinger · ‎05-24-2007

Ken, nice to chat with you!

Thoughts? Thoughts!

Well I would recommend you to read the Solution Reference Network Design Guides:

www.cisco.com/go/srnd

Might be a lot of stuff at first, but at least it is good stuff!

On top, what really also is important from a project management perspective when rolling out VoIP:

"Cisco Undertakes the Largest IP Telephony Deployment in Industry History"

http://www.cisco.com/web/about/ciscoitatwork/case_studies/ipcommunications_dl4.html

Many very good ideas and gotchas, especially for "techies" tending to forget to ask users what they want ;-)

Regards, Martin

P.S.: Being an instructor I obviously also would recommend to attend Cisco VoIP related training classes, which cover many aspects in theory and hands-on labs.