Re: Slow FTP throughput on 6509

mlsommer · ‎10-24-2002

I am seeing throughput of 1.5Mbps and lower when trying to copy a large file (a RedHat .iso) via FTP across my 6509 (running in hybrid mode). Sometimes as low as 200kbps. There are Gb uplinks between the subnets connected to the 6509 and of these Gb links utilizations is usually around 2-3 Mbps.

If I FTP the same file within a local subnet it transfers at an acceptable rate of around 17Mbps+. I don't know if I possibly have something misconfigured or what.

I would REALLY appreciate any insight into this that anyone could offer. I wanted to check here before opening a TAC case.

Thanks,

Matt

steve.barlow · ‎10-24-2002

Is it the same across all vlans or just one (could it be a host issue)?

I would try:

-see if there are errors on the Gb links (speed/duplex issue?)

-put a sniffer on to see what is going on

-there is a free app called QCheck (from NetIQ) that you can use that measures throughput between PCs

-check load on the devices (cpu/memory/buffers/queues)

Hope it helps

Steve

mlsommer · ‎10-24-2002

Thanks for your response.

I'm going to try to answer your questions in order:

It's the same across all VLANs.

I've checked for errors and I'm not seeing any. Speed/duplex actually was an issue with the first server I noticed it on. I had hard-coded 100/full on the switch, but the server admin had left everything set to auto. However, after fixing this, the transfer speed only improved slightly.

When I do a sniffer capture, I'm seeing TCP stuck window and TCP low window (about 4/1 of stuck window to low window).

I'll try Qcheck in the morning by putting an endpoint on the server.

The cpu/memory etc. seems real low on my PC. I'll check into the server tomorrow, though I was assured it wasn't under any load at all.

Thanks for the good advice. Can anyone tell me what "TCP stuck window" and "TCP low window" errors are typically symptomatic of?

Thanks again,

Matt

steve.barlow · ‎10-24-2002

Description:

As a recipient of TCP data receives packets, it transmits acknowledgement packets back to the sender. Acknowledgements may also be "piggybacked" inside of returned data packets. These packets contain the recipient's Window size, which is the buffer set aside for the data portion of TCP packets. For example, on a typical Ethernet, a node may indicate that it's window size is 8760 bytes.

As the recipient is processing packets, it may get slightly behind and begin sending acknowledgements with a Window size of less than 8760 bytes in the above example. This is usually not a problem, unless the Window size is so low that the sender momentarily delays sending additional packets, lowering overall throughput. If the window size is less than the max, the flow of data is restricted. The sender won't exceed the receivers window size.

Sniffers constantly monitoring these Windows sizes in both directions for all conversations. The Expert on a sniffer will note if the Window falls below a user-defined percentage (50 percent by default) of the maximum window. In the above example, you would have to have a Window size of 4380 or less. If the Window size goes down to zero, then another diagnosis, "Zero Window" is given. If the window gets low but not zero, it gives a "Low Window" message.

The bottom line is that a "stuck window" diagnosis (packet has the window size stuck for longer than the threshold) is somewhat subjective. If the user feels that they are still getting good throughput and the Expert is flagging these "stuck" window conditions unnecessarily, then the percentage of the maximum window can be lowered.

On the other hand, if throughput is a problem, a stuck window while reading a file could indicate a slow client or a stuck window while writing a file could indicate a slow server.

Causes:

-Receiver is overloaded

-Receiver has run out of buffer space

-Problem with the receiver

-Too many connections to the receiver resulting in less buffer space

Actions:

-Upgrade receivers CPU and/or mem

-reduce connections to receiver

-increase network bandwidth

Steve

mlsommer · ‎10-25-2002

Thanks for your excellent explanation. If it truly is a host or server issue, I wonder why FTP transfers are pretty fast when transferring to the server on the same subnet. It only seems to be extremely slow when routing between two subnets.

Also, when you say the receiver is out of buffer space, which buffer are you referring to?

Thanks for the help,

Matt

steve.barlow · ‎10-25-2002

The buffer of the NIC. From "Web Performance Tuning " :

"NICs have on-board buffers, and a bigger buffer always gives you more flexibility. The buffer holds incoming/outgoing data until it can forward the data to the network or up to the PC. A larger buffer makes a buffer overflow and consequent data loss less likely. Lost TCP/IP data is simply retransmitted, adding to overhead. Typically, 8bit Ethernet cards have 8K buffers, while 16-bit cards have 16K buffers.

When a NIC has a complete unit of data from the network and is ready to forward it on to the computer's bus, it generates a hardware interrupt, which forces the CPU to save its current state and run the network card interrupt handler, which retrieves the data from the NIC's buffer and fills a data structure in memory. Therefore, a critical performance factor is how many interrupts per second the CPU, memory, and bus can handle from the NIC.

Another important measure of a server is how quickly it can get data from RAM or disk out to the network interface. This involves copying data from one place in memory to another, which is typical of server activity. Data is copied from the server's memory to the network interface card memory. Given a 1500-byte outgoing Ethernet packet, the OS must copy it - probably 4 bytes at a time - from RAM or cache out to the NIC buffer, so this copy would require 375 bus cycles to complete. The bcopy or memcpy library calls are often used here, so the efficiency of your server's implementation of these library calls is significant. This is also where the implementation of TCP/IP in your kernel becomes significant. If you have a poor implementation, it probably means the wait between the NIC's interrupt and the retrieval of a packet from the NIC's buffer is large, so additional packets arriving on the NIC may not find sufficient buffer space and may be dropped or overrun data in the buffer. This results in a costly retransmission of the lost packet."

Is ftp slow with different servers or is it always the same one when across subnets?

Steve

steve.barlow · ‎10-25-2002

The buffer of the NIC. From "Web Performance Tuning " :

"NICs have on-board buffers, and a bigger buffer always gives you more flexibility. The buffer holds incoming/outgoing data until it can forward the data to the network or up to the PC. A larger buffer makes a buffer overflow and consequent data loss less likely. Lost TCP/IP data is simply retransmitted, adding to overhead. Typically, 8bit Ethernet cards have 8K buffers, while 16-bit cards have 16K buffers.

When a NIC has a complete unit of data from the network and is ready to forward it on to the computer's bus, it generates a hardware interrupt, which forces the CPU to save its current state and run the network card interrupt handler, which retrieves the data from the NIC's buffer and fills a data structure in memory. Therefore, a critical performance factor is how many interrupts per second the CPU, memory, and bus can handle from the NIC.

Another important measure of a server is how quickly it can get data from RAM or disk out to the network interface. This involves copying data from one place in memory to another, which is typical of server activity. Data is copied from the server's memory to the network interface card memory. Given a 1500-byte outgoing Ethernet packet, the OS must copy it - probably 4 bytes at a time - from RAM or cache out to the NIC buffer, so this copy would require 375 bus cycles to complete. The bcopy or memcpy library calls are often used here, so the efficiency of your server's implementation of these library calls is significant. This is also where the implementation of TCP/IP in your kernel becomes significant. If you have a poor implementation, it probably means the wait between the NIC's interrupt and the retrieval of a packet from the NIC's buffer is large, so additional packets arriving on the NIC may not find sufficient buffer space and may be dropped or overrun data in the buffer. This results in a costly retransmission of the lost packet."

Is ftp slow with different servers or is it always the same one when across subnets?

Steve

vkasacavage · ‎10-24-2002

I agree with the previous post....you should put a sniffer on the network to diagnose the problem. More likely than not, you are having a host problem.....probably MTU or TCP window related, but the only way to tell is with a trace.....from an acceptable site....then from an unacceptable site.

Compare the two traces, and you should see the problem

scottmac · ‎10-25-2002

HI. This is a long shot, but I've seen it pop up more than once.....

Are the Gig links fiber or copper? If they're copper, run a sweep on the segment with a qualification-grade cable scanner. Try changing the jumpers.

Even if it's fiber, there's a chance that the connectors are dirty / chipped / cracked etc and could be causing low-level errors. If there's some way you can substitue a new piece of media, try it and see if it changes anything performance-wise.

Scott