I know that I will be asked for more information on my setup but basically I am going to describe it below (summarized). I have a DR site that we replicate data to. Until this week, we were using an OC3 over ATM connection between two Cisco 3825 routers for our WAN link. With the OC3 connection it'd take roughly 6 hours to replicate XXX GB's of data while saturating the link about 92-95%.
This week we upgraded the OC3 connection to AT&T Gigaman Gigabit connection. The handoff at the router is fiber with SC connectors terminated into an SFP module. The configuration for the interface is below:
ip address XX.XX.XX.XXX 255.255.255.252
ip flow ingress
Now that we are using the Gigabit link, it doesn't seem as though our transfer rates are as good as expected. We are still replicating the same amount of data in about the same time. However, the difference is that the new circuit is only being utilized about 8-10% MAX. While that number is not surprising, I find it alarming that the transfer speed is about the same as the OC3. The DR site is located about 150 miles from our main facility.
Here is the interface stats from the Gig0/0/0 interface on one of the WAN routers:
GigabitEthernet0/0/0 is up, line protocol is up
Hardware is PM-3387, address is 001d.7037.f813 (bia 001d.7037.f813)
Internet address is XX.XX.XX.XXX/30
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 22/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, link type is autonegotiation, media type is LX
output flow-control is XON, input flow-control is XON
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:00, output 00:00:00, output hang never
Last clearing of "show interface" counters 05:53:34
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 89288000 bits/sec, 7633 packets/sec
5 minute output rate 2034000 bits/sec, 3804 packets/sec
174946063 packets input, 876651764 bytes, 0 no buffer
Received 354 broadcasts, 0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 0 multicast, 0 pause input
0 input packets with dribble condition detected
88550515 packets output, 1604153343 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out
Can anybody shed some light on this? This behavior doesn't seem typical.
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
What's your 3825 CPU utilization look like during this data transfer?
All right you have a gig hand-off, but is the SP providing full gig service?
If not, what does the SP do with overrate packets?
If it is full gig, and if using TCP, is the receiving host's RWIN sized for the bandwidth delay product?
Joseph is pointing to the real issue, Windows by default acknowledges every second packet, if you check the packets in and out you will see the numbers are almost 2:1 for packets. The underlying issue is the Windows TCP stack. You are witnessing the default behaviour of the Windows stack.
Unless you have experience working with the Windows TCP stack tread carefully with the following information.
Two items are at issue here, the setting TCP1323Opts is the way Windows deals with TCP flow treatment such as windows timestamps and scaling, read up on this before making any changes, it affects quite a bit of the behaviour you are describing.
The next item is TcpAckFrequency, which determines the amount of outstanding ACKs before an ACK is required to be sent. Also how long to wait before expecting an ACK to be received before sending the next packet.
In a nutshell, the reason the stream doesn't increase to higher speed is the hosts are spending lots of time waiting for ACKs before sending more data.
The issue doesn't appear to be router, it is the defaults of Windows TCP, which doesn't show up until you have more available bandwidth to fill. Welcome to the gigbit club, and the need to work closely with the server administrators to get max bandwidth performance.
Thanks for the great information. So, we are actually measuring replication time between two Dell Equallogic SAN arrays over the Gig WAN link. We are using SAN HQ software on a Windows server to view all the data for the management of the SAN's. Does this mean that our bottleneck then is the TCP stack as configured on the Dell Equallogic? We are using iSCSI for the protocol.
The Dell Equilogic support group is the next stop. The Dell Equilogic's core operating system I believe is NetBSD so the Dell Equilogic support group is your best choice.
The first item they will approach is the replication window size, roughly the same as the TCP window size, which means the amount of data in flight before expecting and acknowledgement. The larger the replication window size the more packets is flight at a time. Basically as Joseph was pointing, the bandwidth delay product is the issue. More delay means more time waiting for acknowledgements, so increase the amount of data in flight between acknowledgements to ensure more data transmitted and less time between wait periods.
Thanks for the information. I have opened a ticket with Dell on the supported commands to change window size. Now, my question is, will I need to enable "TCP Extensions" on my PIX or ASA firewall to allow for this larger window scale? We are going to adjust the RWIN size on the partner array at our DR site which is behind a PIX firewall.
The scaling option for larger TCP windows is supported on most of the newer versions of PIX firmware by default http://www.cisco.com/en/US/products/hw/vpndevc/ps2030/products_tech_note09186a0080742d6e.shtml
The only caveat is to check on your firewall to ensure the command "tcp-options window-scale clear" isn't in effect. Alternately you could set the explicit "tcp-options window-scale allow" command, between the two hosts, to ensure the TCP windows scaling option comes through unchanged.
So, we changed the TCP window size on the Dell's and that made a significant improvement in replication times. We are pleased with the performance. However, I have one last question. We adjusted the window size on the Dell Equallogic's to 2Mb. The default tcp window size on the CIsco router is 4128 bytes. Wouldn't I want to change that window size on both sides of the link to at least 2Mb? I found this document from Cisco, and I am thinking it would make most sense to change it on the router as well:
It's good to hear there is progress. The TCP Window size is negotiated between the end stations during the initial TCP handshake and during the TCP conversation. This means the router isn't involved in the negotitiation TCP Window sizes between the two devices, it only forwards the packets.
The feature you pointed out is only used for transfers between the router itself and another host. An example would be an FTP transfer between the router to another host or maybe an extremely large BGP routing table.
I would keep an eye on the CPU utilization on the router and firewall, because packets with TCP options set on some versions of IOS are not CEF switched but are really process switched. Process switching is using the CPU to make forwarding decisions for packets rather than the ASICS. Also if the firewall is doing more TCP inspections than expected, this too can consume CPU cycles in the firewall. Neither are usually a common problem, but something to watch carefully and establish a baseline for future comparison/troubleshooting during a data replication cycle.