improving download/upload speed on catalyst 6509

Unanswered Question
Mar 19th, 2008

LinuxES-lab1: 192.168.15.110/24

LinuxES-lab2: 192.168.15.100/24

Win2k3-1: 192.168.15.111/24



All 3 devices are connected to a Cisco Catalyst 6509 with sup-32 and

copper Gigabit Ethernet interface. All 3 devices are dell servers.

lab1 is dell 2550 dual processors 3.0 Ghz with 2GB RAM.

lab2 and Win2k3-1 are dell quad processors 3.1Ghz with 4GB RAM.

Everything on the switch and the interfaces on the server is

hard-code to 1000/full.


I have an FTP Server and Iperf running on LinuxES-lab2. When I

tested iperf from lab1, I get about 856Mbps throughput:


[[email protected]-lab1 tmp]# iperf -c 192.168.15.100 -t 10

------------------------------------------------------------

Client connecting to 192.168.15.100, TCP port 5001

TCP window size: 16.0 KByte (default)

------------------------------------------------------------

[ 3] local 192.168.15.110 port 32877 connected with 192.168.15.100 port 5001

[ ID] Interval Transfer Bandwidth

[ 3] 0.0-10.0 sec 1020 MBytes 856 Mbits/sec

[[email protected]-lab1 tmp]#


When I tested from Win2k3-1, I get about 600Mbps throughput.


However, when I download a 2GB file from lab1 to lab, I get only

about 325Mbps. If I used Secure Copy (scp), I get only about 72Mbps.

If I used Secure FTP (sFTP), I only get about 24Mbps.


Is there a way to improve the download speed for FTP, scp and sFTP?


Thanks.

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (1 ratings)
Loading.
pciaccio Wed, 03/19/2008 - 04:25

One thing you could try is Jumbo Frame size. Set your MTU on these devices to a larger value then the default 1500 size. This will increae the data troughput. Since you can squeeze more data onto a packet with the increased MTU size then the IP overhead is decreased because less packets need to be sent over the line. Refer to attached for further details...Good Luck..



cisco24x7 Wed, 03/19/2008 - 04:35

Unfortunately, the blade I have on the 6509 does

NOT support jumbo frame. It is a 10/100/1000

PoE blade. I am aware of the jumbo frame

but could not implement with my 6509.


Any other ideas? Thanks.


CCIE Security

pciaccio Wed, 03/19/2008 - 04:48

Your board should support Jumbo Frame. Go to the Gig Interface and type MTU 9198 for the interfaces you are concerned about. Also make sure that your NIC cards support Jumbo frames as well....

cisco24x7 Wed, 03/19/2008 - 05:58

"f you haven't already, you might review the articles under "Here are some additional links you may find useful" within http://dast.nlanr.net/Projects/Iperf/ Also, insure you have the latest NIC drivers on your hosts."


I have the latest NIC drivers because I personally recompile my linux kernel. The driver is good. Otherwise, I would not get

856Mbps throughput with Iperf.


Take a look below, my catalyst 6506 gig blade

does NOT support jumbo frame. See below:


S65-1#conf t

Enter configuration commands, one per line. End with CNTL/Z.

S65-1(config)#int g1/31

S65-1(config-if)#mtu ?

% Unrecognized command

S65-1(config-if)#mtu

S65-1#sh ver

Cisco Internetwork Operating System Software

IOS (tm) s3223_rp Software (s3223_rp-ENTSERVICESK9_WAN-M), Version 12.2(18)SXF4, RELEASE SOFTWARE (fc1)

Technical Support: http://www.cisco.com/techsupport

Copyright (c) 1986-2006 by cisco Systems, Inc.

Compiled Thu 23-Mar-06 18:30 by tinhuang

Image text-base: 0x40101040, data-base: 0x42D20000


ROM: System Bootstrap, Version 12.2(17r)SX3, RELEASE SOFTWARE (fc1)

BOOTLDR: s3223_rp Software (s3223_rp-ENTSERVICESK9_WAN-M), Version 12.2(18)SXF4, RELEASE SOFTWARE (fc1)


S65-1 uptime is 19 weeks, 19 hours, 27 minutes

Time since S65-1 switched to active is 19 weeks, 19 hours, 26 minutes

System returned to ROM by power-on (SP by power-on)

System image file is "bootdisk:s3223-entservicesk9_wan-mz.122-18.SXF4.bin"



This product contains cryptographic features and is subject to United

States and local country laws governing import, export, transfer and

use. Delivery of Cisco cryptographic products does not imply

third-party authority to import, export, distribute or use encryption.

Importers, exporters, distributors and users are responsible for

compliance with U.S. and local country laws. By using this product you

agree to comply with applicable laws and regulations. If you are unable

to comply with U.S. and local laws, return this product immediately.


A summary of U.S. laws governing Cisco cryptographic products may be found at:

http://www.cisco.com/wwl/export/crypto/tool/stqrg.html


If you require further assistance please contact us by sending email to

[email protected].


cisco WS-C6506-E (R7000) processor (revision 1.0) with 983040K/65536K bytes of memory.

Processor board ID SAL1003AFTB

R7000 CPU at 300Mhz, Implementation 0x27, Rev 3.3, 256KB L2, 1024KB L3 Cache

Last reset from power-on

SuperLAT software (copyright 1990 by Meridian Technology Corp).

X.25 software, Version 3.0.0.

Bridging software.

TN3270 Emulation software.

16 Virtual Ethernet/IEEE 802.3 interfaces

96 FastEthernet/IEEE 802.3 interfaces

57 Gigabit Ethernet/IEEE 802.3 interfaces

1915K bytes of non-volatile configuration memory.


65536K bytes of Flash internal SIMM (Sector size 512K).

Configuration register is 0x2102


S65-1#

S65-1# sh mod

Mod Ports Card Type Model Serial No.

--- ----- -------------------------------------- ------------------ -----------

1 48 48 port 10/100/1000mb EtherModule WS-X6148-GE-TX SAL1003ASWV

5 9 Supervisor Engine 32 8GE (Active) WS-SUP32-GE-3B SAL1016KKWC


Mod MAC addresses Hw Fw Sw Status

--- ---------------------------------- ------ ------------ ------------ -------

1 0016.c7ed.7d98 to 0016.c7ed.7dc7 1.1 7.2(1) 8.5(0.46)RFW Ok

5 0016.c7ae.2f82 to 0016.c7ae.2f8d 4.2 12.2(18r)SX2 12.2(18)SXF4 Ok


Mod Sub-Module Model Serial Hw Status

---- --------------------------- ------------------ ----------- ------- -------

5 Policy Feature Card 3 WS-F6K-PFC3B SAL1016KP28 2.1 Ok

5 Cat6k MSFC 2A daughterboard WS-F6K-MSFC2A SAL1016KG48 3.0 Ok


Mod Online Diag Status

---- -------------------

1 Pass

5 Pass

S65-1#


Anymore ideas folks? Thanks.


CCIE Security

legerity1_2 Wed, 03/19/2008 - 06:41

You might want to consider changing the TCP window size to force TCP to put more data in play, since jumbo frames are not an option. I try and steer clear of this in production, especially in a multi-OS environment, because flows that must go over the internet usually take a performance hit over internal flows. But! If this is an internal server and you have the time to tweak it, take a look:


http://dsd.lbl.gov/TCP-tuning/


HTH,

Geoff


legerity1_2 Wed, 03/19/2008 - 06:43

window size - and buffer sizes - sorry that wasn't clear, lol.


Geoff

cisco24x7 Wed, 03/19/2008 - 07:28

Geoff,


"window size - and buffer sizes - sorry that wasn't clear, lol."


I already made modification to these settings

in the /proc directory according to Linux

documentation. I still can NOT scale sFTP

pass 24Mbps and scp pass 80Mbps throughput.


I would imagine if I have issues with the

windows size and buffer sizes, then Iperf

would NOT have shown a 858Mbps throughput.


Anymore ideas? Thanks.


CCIE Security

Joseph W. Doherty Wed, 03/19/2008 - 08:19

A possible consideration, you're using Iperf with the implication that all other software using TCP should obtain similar performance. It could be as simple as the applications themselves are the root of the poorer performance, especially two that are performing on-the-fly(?) encryption.


PS:

Re: my earlier mention of NIC drivers, I also had in mind the lower Iperf(?) performance you saw with Win2k3-1.

legerity1_2 Wed, 03/19/2008 - 08:25

i totally agree. ultimately if its not the tcp stack and not the networking devices we have to look at the remaining culprit - the software.


You might want to look at a time/sequence graph and see if the flows are steady - if not, you may have an implentation/encryption problem.



cisco24x7 Wed, 03/19/2008 - 09:51

"especially two that are performing on-the-fly(?) encryption. "


That could very well but highly unlikely because

I performed "vmstat 1" on both linux boxes while

the transfer is taking place, the CPU is running

at 90% IDLE. In other words, the box has plenty

of cpu horse power left. One other thing, I

am using ssh with aes-128 encryption so the

encryption algorithm is very efficient.


Anymore ideas? Thanks.

Joseph W. Doherty Wed, 03/19/2008 - 10:21

Low CPU utilization, might rule that out as a bottleneck. That's assuming all CPU consumption is correctly accounted.


There are other system interactions that can lead to poor application performance. How the application reads and writes to disk, how the application bounces around within its working set, how the application reads and writes to the network. There's also how the system supports such. (An example of how extreme this can get, some computer architecures deliver different performance depending on how reads/writes to RAM are aligned on byte and/or word boundaries.)

cisco24x7 Wed, 03/19/2008 - 13:14

"That's assuming all CPU consumption is correctly accounted".


I used "vmstat 1" to measure cpu utilization.

This low CPU utilization is consistent with

what I see nagios and solarwinds.


The Linux Server is running on a 10k RPM 100GB

RAID 5 drive so I dont' think reading/writing to

the disk is an issue.

Jon Marshall Wed, 03/19/2008 - 13:32

David


I appreciate what you said about iperf results but where are your servers patched into the WS-X6148-GE-TX.


The WS-X6148-GE-TX is a heavily oversubscribed blade ie. it has an oversubscription rate of 8:1 so for every 8 ports there is maximum throughput of 1Gbps.


The port groupings are


1 - 8

9 - 16

etc..


so if you haven't already you may want to ensure that each of your servers has a port group to itself.


As i say i appreciate your iperf results but this may be worth a try.


Jon

cisco24x7 Wed, 03/19/2008 - 14:39

Linux_1 is connected to port 1


Linux_2 is connected to port 11


Win2k-1 is connected to port 21


same result. Iperf shows 856mbps throughput

while scp and sFTP show very poor performance.


Anymore ideas? Thannks.

Joseph W. Doherty Wed, 03/19/2008 - 16:10

Well, if you interconnect your two Linux servers, see what results you get then.


I suspect they be what you've seen so far. That would then point at the hosts and/or their applications.


PS:

BTW, RAID 5 slows writes. It's great for the "I" portion of acronym, but not for write performance. Have you benched the drive standalone? It might account for major portion of the 325 Mbps you've documented.

cisco24x7 Wed, 03/19/2008 - 19:34

Jon,


"I appreciate what you said about iperf results but where are your servers patched into the WS-X6148-GE-TX.


The WS-X6148-GE-TX is a heavily oversubscribed blade ie. it has an oversubscription rate of 8:1 so for every 8 ports there is maximum throughput of 1Gbps."


Is this documented anywhere? Can you provide

the link for this? Thanks.



Jon Marshall Thu, 03/20/2008 - 01:34

David


You had to ask :-). I can never find the doc that explains all this but Edison Ortiz seems to know where they are whenever we get into these sort of discussions so i've requested he post the link if he has it.


Jon

cisco24x7 Thu, 03/20/2008 - 07:03

Ok I replaced the catalyst 6506 with an

Extreme switch. I am now able to push

about 600mbps FTP, 350mbps scp and 300mbps

sFTP. at 350mbps scp throughput, CPU on the

linux boxes is at 90% utilization which is

expected.


It seems to me like the catalyst 6506 can not

scale past 90-100mbps with scp traffics on the

Gig port.


Any ideas anyone? Thanks.

Jon Marshall Thu, 03/20/2008 - 07:53

Edison


I'm getting confused now


=============================================


When you use either the WS-X6548-GE-TX or WS-X6148-GE-TX modules, there is a possibility that individual port utilization can lead to connectivity problems or packet loss on the surrounding interfaces. Especially when you use EtherChannel and Remote Switched Port Analyzer (RSPAN) in these line cards, you can potentially see the slow response due to packet loss. These line cards are oversubscription cards that are designed to extend gigabit to the desktop and might not be ideal for server farm connectivity. On these modules there is a single 1-Gigabit Ethernet uplink from the port ASIC that supports eight ports.


---> These cards share a 1 Mb buffer between a group of ports (1-8, 9-16, 17-24, 25-32, 33-40, and 41-48) since each block of eight ports is 8:1 oversubscribed. The aggregate throughput of each block of eight ports cannot exceed 1 Gbps.. <---


Table 4 in the Cisco Catalyst 6500 Series 10/100- & 10/100/1000-Mbps Ethernet Interface Modules shows the different types of Ethernet interface modules and the supported buffer size per port.



=============================================


I'm sure you had diagrams that you posted that showed the port groupings of these modules ?


Jon

Edison Ortiz Thu, 03/20/2008 - 08:24

Don't be confused. That Release Notes is wrong


I did some digging now and found some internal documents which I can't publish.


The WS-X6148-GE-TX has 2 Pinnacles that connect to the ASIC but these 2 Pinnacles are broken down into 3 Port Groups each. Each Port Group has 8 Ports.


Pinnacle 1

Port Group 1 = Ports 1-8

Port Group 2 = Ports 9-16

Port Group 3 = Ports 17-24


Pinnacle 2

Port Group 1 = Ports 25-32

Port Group 2 = Ports 33-40

Port Group 3 = Ports 41-48


HTH,


__


Edison.


cisco24x7 Thu, 03/20/2008 - 08:53

"These line cards are oversubscription cards that are designed to extend gigabit to the desktop and might not be ideal for server farm connectivity. On these modules there is a single 1-Gigabit Ethernet uplink from the port ASIC that supports eight ports."


So let me see if I understand this correctly

since I am a firewall/security person and not

a routing/switching person. Cisco is selling

me a Gigabit line card but the line card can

NOT do gig throughput with my servers. Is

that a correct statement?


Maybe it is time for me to look at Extreme

switches.

Jon Marshall Thu, 03/20/2008 - 09:00

David


Your understanding is correct. However if you see where Cisco position this module it is in the wiring closet and not as a server farm blade. So chances are it is unlikely that you will be oversusbcribing too much at any one time.


Obviously it is also cheaper than a module that supports full gigabit throughput on each port although even with the 6748 module there is a little oversubscription ie. 48Gbps ports with a 40Gbps connection to the switch fabric.


Many people get all wound up about gigabit throughput being just that but this module was primarily designed for clients not servers, hence the oversubscription.


Edit - to be more precise, the line card can do gigabit througput on a port but if more than one port in the group of 8 is being used at the same time neither port will have the full gigabit throughput.


Jon

Konstantin Dunaev Thu, 03/20/2008 - 09:27

hi,


as a suggestion - try to sniff the traffic which is sending between the linux boxes, with iperf test, ftp and scp.


I'm pretty sure that in scp test you will get a lot of "retransmittions" and the TCP window size will not go up to the limit.


but if you start additional scp sessions you will see then summ of the scp connections are increasing proportionally to the number of scp session.


adam.sellhorn Thu, 03/20/2008 - 09:31

The Extreme chassis modules, at least the last time I looked, are also oversubscribed. The G48Te, for example, has an oversubscription of 4:1, 2:1 with 2 MSM. I am by no means an expert on any of this but I do have Extreme gear in our Core. If you are a CLI guy like I assume you are I don't think you will enjoy the interface Extreme has to offer.


Just my two cents.


Joseph W. Doherty Thu, 03/20/2008 - 09:33

(Indentation was getting a bit much.)


There's still something odd about this. Regardless of the oversubscription capacity of the card, why such differences in traffic rates between Iperf, FTP, and scp/sFTP on the 6500?


Yes, the last set of stats, on the Extreme switch, show scp/sFTP at half the rate of FTP with server being CPU constrained, but not the same proportions across the 6500. I.e. it makes sense IPerf would be the fastest, likely NIC limited; followed by straight FTP, perhaps disk limited; followed by scp/sFTP, CPU limited. What doesn't make sense why, if the 6500 could handle 856 Mbps Iperf and 600 Mbps windows, it couldn't also handle similar bandwidths for the other traffic as did the Extreme switch.

Actions

This Discussion