I am running a number of sites with 4 circuits between them using load balance by packet.
They work well as far as balance goes but I have had issue with file transfer throughput being very low. During testing I can actually get better throughput by shutting down 2 of the circuits.
There are no errors anyplace and all circuits are exactly the same as far as speed and latency.
After getting a packet capture from both client machines there are a huge number of packet retransmissions. There is no actual loss of any packet in the capture so these are not really valid retransmissions.
From what I can tell it is due to the packets arriving out of order. This causes many acks to be sent for the same seq number which the far end decides is loss and retransmits. From reading the RFC it appears that 3 is the magic number to cause a retransmission.
I know I can use multilink ppp to solve the out of order issue but it increases the knowledge level to support. I have a few NOC guys that will not understand that circuits can be down but still appear to ping.
Anyone suggestions. I have been looking to see if there is any parm in the tcp stack that would affect this but this appears to be very fundamental to tcp.
It does sound like it is due to packets arriving out of order. Normally the TCP implementation should be able to handle this and reorder the segment. However, as you have observed, there are some implementations that simple don't do that very well.
It sounds like you have considerable jitter between the various paths. Maybe one lf the paths is congested and the others are not. That could really cause a lot of out-of-order delivery.
If your paths are multihop, the problem could be made worse because the per-packet load balancing knows only about the local router - it does not know what is two hops down the line.
However, you do say that you are considering ppp multilink, so I guess you are just one hop away (unless you are multilinking down tunnels of course).
I guess it is going to depend on window size, segment size, and latency jitter. I sounds like you may actually be better off with per-destination.
Per packet load balancing = out of order arrival = poor tcp (and other protocols) performances.
You case is once again proving the equation above. Why you can't use regular load balancing ?
I wish I could use per session load balancing but the main reason many of these sites were upgraded was there was a new requirement to transfer large amounts of data . The user would be limited to a single circuit if I did that since all the traffic goes between 2 machines.
They will never be able to use over 2 links of bandwidth because of the latency and their use of CIFS to transfer the files. At least if I get this fixed I can say its not the network where as now it is truly a network issue causing their delays.
The limitation for CIFS transfer speed isn't usually CIFS so much as the default TCP receive window size within XP and earlier. You can modify the default in the registry. Or, if you running on a host connection of 100 Mbps, move to gig. (Windows increases the default for gig connections.)
Vista TCP stacks advertise a large TCP receive window. We've seen them pull data 3x faster, running across high BDP links, than a XP client, if both are running registry defaults.
Adding a comment:
As you know that TCP applications requires speciall treatment.
This is a typical behaviour of TCP synchronization, Window Size increases and gets interrupted at a point , drops back and restart synchronizing.
I suspect that because per-packet load-balanc affect some TCP sessions, so its restart syn with its peer thus resulting in the transmissoin behaviour you 've seen.
Can you guys explain to me what is the difference between load balance by packet and load balance per session? with example might be more helpful. Thank you so much for your help.
With per-packet, router spread packets over links having same routing metric, in a strict round-robin fashion. Due to different size of packets, and different queue conditions of the individual links, packets are easily sent out of sequence, with the consequences that are described in this thread.
With per-destination or CEF load balancing, router actually computes "flows" or "sessions" based on IP addresses in source and destination, and associate these to links, for the duration of a caching period. This way packet arrival order is preserved and everyone is happy.
The default is per-destination, and for good reasons.
Thank you Bevilacqua,
So you are saying, it is better to use per-destination balancing with TCP applications rather than per-packet right?
I'm saying, always use the cef default (that is not simply per-destination, but takes into account more data to diversify flows).
Multilink PPP is definitely the best way to solve this issue. I seriously don't think it should increase the knowledge level to support all that much.
I'd recommend getting a good network management piece of software, such as Solarwinds Orion, that will allow you to put all your physical T1's in, as well as your multilink bundle. That way they can see exactly what goes down very easily. Just put in the management IP of the router during configuration and it tells you if a circuit goes down on that router. You don't have to bother with pinging each interface.
Works very well for us in these situations.
Windows XP (and earlier) stacks do have a registry parameter that controls the counter for dup ACKs, (w2k/XP) TcpMaxDupAcks. Don't know about other non-Windows OSs.
But out-of-order arrival doesn't cause duplicate acks, it causes window resets.
Experience shows, trying to fix the 'application' (let's call tcp the application) is pointless. Fix the network instead.
Could you clarify "But out-of-order arrival doesn't cause duplicate acks, it causes window resets." in reference to RFC2001 and RFC2581 discussing generation of duplicate ACKs when dealing with out-of-order TCP packets? Perhaps you have in mind for a window reset, Fast Recovery, which reduces the send window, reducing the transmission rate, initiated by duplicate ACKs?
I agree with recommended solutions that avoid the re-ordering issue, but re-ordering really isn't a broken network, since IP doesn't guarantee sequencing. TCP is designed to deal with re-ordering, but its default Fast Retransmit/Fast Recovery dup ACK count assumes expected bounds on how severe the out of sequence condition normally might be. In this case, likely the normal expectations are being exceeded.
The real risk of changing the default settings is they tend to be global. I.e., increasing the value will likely make TCP too lax for other "normal" flows. That noted, if one is looking for a possible short term solution, and understanding the impact, the value might be increased an increment or two. Longer term, the network can be changed.
Just did a bit more digging. The duplicate ACKs you have in mind is the receiver is using them to advertise a change in its receive window size?
Yes the definition of IP in itself doesn't guarantee that packet order is preserved, but the first distinction we must make is that actually a 'good and proper' IP network _does_ guarantee that, plus service parameters like packet loss, latency, etc.
In that we differ, to me an ip network that delivers out of order, is not just broken, it's seriously broken.
This is the goal we network engineers strive for, and that is accomplished using good devices, good circuits, and best design practices. Considering the large amount of money that a proper networks costs, it's reasonable to expect we deliver packets in order to our customers.
In other words, I'm not interested at all in exploring nice properties of TCP or other "applications". I know how the 'good and proper' IP network has to behave, and I deliver that, or nothing at all (I leave the business to somebody else).
To answer to your note about TCP internals, I concede that out-of-order packet can cause duplicate acks, even more, I concede that they cause mess and havoc.
The thing is that in all honesty I think that once you begin messing around with that, you're doing only more damage, even if there is a possibility that an obscure parameter can do some improvement into this or that operating system.
That is something nice to know for you studies, totally inadequate for enterprise and day-to-day operation.
Apparently we do differ somewhat in philosophy. I guess I am the "somebody else" who delivers real results, for "impossible" situations, in the real world. (Likely also due to the fact I'm not a network engineer either.)
As an example, about 7 or 8 years ago I was working with a customer doing huge database replications across the "pond". They upgraded from multiple T1/E1s to dual T3/E3s. However, the replication transfers were not taking advantage of the additional bandwidth. The cry went out, there's something wrong with the network, fix it!
One requirement that made the transfer rate critical, the transfer had to be completed within a set time window. (In this case between end of business Friday night, and start of business Monday morning.)
The network support group validated there wasn't anything wrong. No packet loss, expected latency, not even duplicate ACKs.
From my "studies", I suspected this was the classic LFN (long fat network) TCP issue. At the network level, all you have to fix this problem is provide a trans-Atlantic WAN with typical LAN latencies. Today's physics of electrical and/or optical propagation speed makes that impossible to accomplish.
I suggested multiple ways to increase transfer rate. One was, increase the receive window size on the Windows NT receiving server. They too, like you, weren't keen about changing some "obscure parameter" within the OS. They too also questioned that it would provide any improvement.
Since they hadn't found any other solution, and were required to find a solution, and this was the least complex possible solution, they tried it. They were surprised and happy when the transfer rate literally increased 5x.
We're we might agree, this solution was an EXCEPTION to the norm. For instance, it was only done on just one server.
Today, there are additional ways to solve the above problem, such as using a Vista host or some type of WAAS device, but none still really concerning the network itself providing perfect packet sequencing, no loss, ideal latency, etc.
For the poster's problem, I too still prefer avoiding the out-of-order issue, but believe all effective solutions should be on the table, though with a correct analysis of their pros and cons. Even for the suggested MLPPP, possible performance impact, to the end link routers, should be considered.