cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3892
Views
0
Helpful
5
Replies

Cisco Catalyst Switch 3110X for IBM BladeCenter 10GB Performance

he-wun.kim
Level 1
Level 1

Hi,

We are using a Cisco Catalyst 3110X in a IBM Blade Centre

Uplink to a Nexus 5010.

The Uplinks are 10GB using the Cisco X2-10Gb-SR modules.

When we are doing large transfers (such as server builds) from the Netapp Filers to the Blade Servers we are getting dropped frames on the output queue of the 1GB Interfaces connecting to the Blade Servers.  We are using Jumbo Frames.

It appears that the 1GB Interfaces are receiving more traffic than they can handle.  Unfortunately the 3110X doesn't have any shaping ability (WRR is not supported for Jumbo Frames) we can police traffic but we can't rate-limit and shape.  The Nexus also is the same no shaping.

Has anyone else come across this issue or have suggestions on how to reduce the number of frames dropping as the result severely degraded performance.

When we copy data to the Filers we get:

47mbps 1GB - > 10GB with Jumbo Frames

68mbps 1GB -> 10GB with MTU set to 1500

4.2mbps 10GB - > 1GB with Jumbo Frames

50mbps 10GB -> 1GB with MTU set to 1500

Regards,

HK

5 Replies 5

jspringfield
Level 1
Level 1

We are seeing the same issue...

10Gb/sSAN connection <--> 10Gb/s Nexus 5020 <--> 10Gb/s Cisco 3110X <-->1Gb/s Blade Server

We get massive packet dropsfrom our SAN to the blade server due to the flooding.  One thing that has saved us is that we are etherchanneling between our two Nexus 5020 switches with 1Gb/s connections (8Gb/s total).  Because the etherchannel load balances on a per flow basis the Nexus switch steps down the speed of each flow (SAN to individual blade server) to fit onto one of the 1Gb/s etherchannel links.  This happens seamlessly and no packets are dropped.  To help alleviate our problem we have to force the traffic over the etherchannel because if we don't we take a big performance hit. 

To test the difference I boot a basic VMWare virtualized Server 2003 (NFS protocol)…

  • Strait to the blade server from the SAN it takes 5minutes to get to the login screen and the blade drops 2,500 packets
  • Going through the etherchannel it takes 45seconds to get to the login screen and no packets are dropped

Jeff

I worked with Jeff on his issue and came up a configuration on the CBS3110X that would help buffer the 10G -> 1G bursts.  This was necessary whenever the traffic came into the Nexus 5000 10G and out at 10G.  When the Nexus 5000 path was from 10G to 1G, the buffer on the Nexus 5000 were utilized instead of the CBS3110X.

The CBS3110X had "no mls qos" configured, which is default.  In order to adjust the queing structrue, we need to enable qos and then adjust the queues.

mls qos queue-set output 1 buffers 25 25 25 25
mls qos queue-set output 1 threshold 1 3200 3200 100 3200
mls qos queue-set output 1 threshold 2 3200 3200 100 3200
mls qos queue-set output 1 threshold 3 3200 3200 100 3200
mls qos queue-set output 1 threshold 4 3200 3200 100 3200

mls qos

We started with this configuration and then further borrowed buffers  from Q1 and Q3, since Jeff's switch wasn't using them for his traffic  classes.  This configuration will allow any queue on any interface to utilize a significant portion of the common pool of buffers.  Of course, if every interface needs these buffers, you will run out and be back where you started.  If a few interfaces at a time are using common pool buffers, it will result in a nice improvement in the number of drops seen.

If you already have qos configured, you would need to tune this setup.  If you do not currently use qos, then this configuration will very likely improve drops due to running out of buffer on the CBS3110X.  This configuration will affect other platforms in this manner as well such as  2960/356/3750/other blade switches.

Note:

To inspect frames enqueued/dropped on each interface, this command is helpful

show mls qos int gigabit x/y/z statistics

Regards,
John Gill

reference

=======

CBS3110X configuration guide:

http://www.cisco.com/en/US/partner/docs/switches/blades/3110/software/release/12.2_52_se/configuration/guide/swqos.html#wp1179728

We have packet drops in our 3110X as well, especially noticable when running restore from a server with a 10G NIC to a bladeserver.

I have tried the approach above in a lab bladecenter and the drops disappeared. So I would like to implement it.

But I can only do that if I can guarantee that it doesn't create new problems with packet drops on other interfaces.

One question that I have been asked is this:

-If one interface borrows buffers from the pool (or from other interfaces, if you only guarantee 80% on each interface), for how long is the buffer locked to the interface before it is free to use by another interface?

Can you see a scenario where this approach could lead to a situation worse than it already is?

Is it a good idea to free up more buffers by only guaranteeing 80% of the buffer to each interface?

So this is a good news/bad news situation.  In our experiance this helped us out greatly however we still had issues where we were getting some packet drops and had some network congestion between our blade servers and SAN.  What made the biggest difference for us was to purchase some Nexus 4Ks for the bladecenter and getting blade servers with 10Gb interfaces on them.  Moving our high IO VMs to the 10Gb blades made them run much better and also helped clear up the congestion for our other VM servers.  Unfortunatly I don't think the 3110X bladecenter switches are practical when paired with a 10Gb SAN network.  The Nexus 4Ks make much more sense and run much better.

Jeff

I fully understand that we have to solve it permanently with other equipment. Thank you for your suggestion.
My thought with this configuration of the buffers is to minimize the impact until we have another solution in place.

But I am not allowed to try it in real production if I can't guarantee that it doesn't impact traffic that doesn't have any problem today.

You say that you still had issues and still were getting packet drops. Was that with the same bladeservers? Or did you suddenly see the problem occur on switch interfaces to servers that didn't have the problem before the change?

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: