100Mb performance on 1Gb Network

Unanswered Question
Apr 24th, 2008

I see disappointing performance on our Netbackup master where clients are transferring data at a rate of between 5-6MB/sec where all the equipment supports gigabit speeds.

We have a secondary Netbackup master that sees speeds of 25MB/sec or so from the same clients.

Running TTCP tests yields 7-8MB/sec with the slow master as the receiver and between 50-100MB/sec on the secondary.

TTCP tests also yield 7-8MB/sec to other servers on the same switch as the slow master, and all this slow behavior is intermittent. About half the time, things run much faster.

Wireshark captures on the slow receiver show dupacks being sent, and captures on the sender show psh/acks being sent. The sender is not sending duplicate packets, and the switches report no dropped packets. In addition, the receiver periodically sends Zero Window packets.

I see a communication delay between the receiver and sender, but the sender sends acks right after it receives packets, and the sender sends packets right after getting acks. And yet both sides seem to be timing out. That is my amateur nterpretation, but our network admins say that ping and traceroute tests show no delays.

I am very skeptical, but I don't know anything about Cisco devices to ask them what to look for.

They say their switches are pretty plain vanilla, using mostly default settings. They won't send me a copy of the switch config, so I don't know what this means.

In the searches I have done, I suspect some kind of flow control or congestion avoidance is going on.

Cisco documents provide a lot of information regarding what to look for in terms of logs, counters and configs, but I am not being given this information.

So, does this behavior ring any bells?

And where can I find what the default configuration of a Catalyst 3750 are?

Thanks for any help.

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
andrew.butterworth Thu, 04/24/2008 - 14:55

Ask the net admin whether they have QoS enabled on the 3750's. There is a known bug that affects high-rate flows when QoS is enabled due to aggressive packet dropping. This is because by default congestion avoidance kicks in. Ask them for the output of the following command:

show platform port-asic stats drop FastEthernet X/X

Where X/X is the port connected to the server and uplinks or other choke points.

The bug is CSCsc96037 and there are workarounds.

HTH

Andy

saucer_love Thu, 04/24/2008 - 21:37

I will ask for this output and for paid access to Cisco web resources. Apparently my free login does not allow me to read bug reports.

What is "aggressive packet dropping"? Is that the formal term? I am curious to know if this is something that happens within and between switches is or if it is something I can see in the packet captures. I don't think I see any evidence of dropped packets, so I don't know what this refers to.

One additional detail I failed to mention was that this intermittent slowness does not happen when sending ttcp data from all legs of the network. There are a couple of legs that never exhibit slowness. Each leg corresponds to a separate switch stack, and I am told all the switch stacks are configured the same.

Thanks for the reply.

andrew.butterworth Fri, 04/25/2008 - 01:32

You need to know whether QoS is enabled or not, if it is you then need to see the output from the command I posted to see if any packets are being dropped from any of the thresholds of the four queues (per interface).

You are not specifically going to see this in captures uneless you capture the ingress & egress ports simultaneously and then compare them - you should see packets enter on the ingress port but not leave on the egress port. This will cause TCP to resend data, plus back-off the Window size etc.

In the BugID it states 'Configuring QoS on a CAT3560 or CAT3750 running any IOS can cause certain TCP applications such as NFS to run slower. It is because the way QoS carves egress thresholds. This issue is noticed inspite of running an IOS which has fix for a related issue as in CSCeg29704'

Andy

saucer_love Fri, 04/25/2008 - 06:57

They say QoS is not turned on, and they see zero dropped packets.

They also say the version of IOS mentioned in the bug report is not the same version running on the switch where the slow server resides.

cisco24x7 Fri, 04/25/2008 - 09:57

I have a Linux_1 connected G1/0/1 and Linux_2 connected to G1/0/2

of a Cisco Catalyst 3750 24-ports. Very simple configuration, NO QoS,

and I get 935Mbps throughput with Iperf. I left both the

Linux servers and the switchport to auto negotiation.

Last login: Wed Apr 23 23:35:15 2008 from 192.168.15.8

[[email protected]-lab3 root]# iperf -c 192.168.3.10 -t 30

------------------------------------------------------------

Client connecting to 192.168.3.10, TCP port 5001

TCP window size: 27.6 KByte (default)

------------------------------------------------------------

[ 3] local 192.168.2.10 port 32773 connected with 192.168.3.10 port 5001

[ ID] Interval Transfer Bandwidth

[ 3] 0.0-30.0 sec 3.26 GBytes 935 Mbits/sec

[[email protected]-lab3 root]#

You may want to use Iperf instead of ttcp. Iperf is a much better

utility than ttcp.

My IOS version on the Catalyst 3750 is c3750-advipservicesk9-mz.122-25.SEE4.bin

saucer_love Mon, 04/28/2008 - 08:15

Thanks for the head's up on lperf. I will research that tool.

Some unrelated maintenance work was done on the switches this weekend, and backup speeds are fast for the first time in months.

I don't know why, but I'll take it!

Thanks for your help.

Actions

This Discussion