cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
5829
Views
5
Helpful
22
Replies

EMC VNX 5300 multihop FCoE to Cisco UCS FI / B series blades

nickeoannidis
Level 1
Level 1

Hi All,

Having a performance issue with FCoE multihop from UCS.

The layout is as follows:

UCS 5108 Chassis --> 6248UP FI's --> Pair of N5k's (5548) --> EMC VNX 5300 2 port FCoE 10gig card x2

Everything is functional - i can fc ping from n5k fabrics to the EMC pwwn.

I can see the pathing in vmware on the vhbas all is up and active. The VMware mode is set to round robin and the EMC side ALUA 4.

I'm at a loss on how to troubleshoot this further.The SAN is connected to the fabric with LC fibre om3 + 10gb SR SFP's. UCS has a seperate pair of fcoe port channels from the 5k fabric down to the FI's (using twinax leads). Data is seperated on its OWN vpc. I essentially followed the cisco guide on how to do this:

http://www.cisco.com/en/US/products/ps9670/products_configuration_example09186a0080c13e92.shtml

The hosts are not boot from SAN for now, they are booted off USB flash storage on the blade.

The show flogi on the 5ks shows everything logged in and as i said before i can see the LUNs on the blades. When howerver i try to uplod an ISO file say 2.5gb in size to a datastore is taking roughly 2 and half hours.

I have the newest UC 2.1(2)a firmware installed. The N5ks have 6.x NX-OS. The blades have ESXi 5.1 with the newest enet and fnic drivers from cisco installed.

The default qos for n5k is configured.

Any ideas would be greatly appreciated!

-Nick

22 Replies 22

mikko.suomi
Level 1
Level 1

Hi Nick,

I have had same kind of problems when I started testing our new FCOE fabric during summer. I have support tickets open to cisco and vmware but neather has yet managed to find solution.

Our environment is allmost identical to yours:

FCOE Connection from UCS fabric interconnect -> Nexus with 2x 10Gbit dual twinax /Fabric (portchannel)

Dedicated to FCOE, (Lan Ethenet has own connections to Nexus swithces with vpc)

FCOE connection from nexus to VNX5500 with FCOE 10GB fiber connector to VNX FCOE card.

Zoning done in the NEXUS only, UCS in end host mode.

vcenter 5.1

esxi5.1 build 1117900

Cisco UCS B200M3 blades, SAN boot Native FCOE (Whole path to storage) VNX5500 SAN (FCOE connectors)

Cisco UCS 6248 Fabric interconnects version 2.1(1d)

Cisco FC drivers fnic version 1.5.0.22-1oem.500.0.0.472560

SAN/LAN Swithces:

Cisco Nexus 5548 swithces (fc/fcoe/ethernet)

system: version 6.0(2)N1(1)

STORAGE HW:

EMC VNX5500 OE 05.32.000.5.201 (block) with FCOE cards

What we have noticed in test is that if you disable fabrics or path performance goes normal until you enable paths again or something else happens. but we cant isolate the problem to one path as it might work now and not work tomorrow...

Allso changing from MRU from RR in path policy, seems to help little.

If you want to test the SAN performance vmware i/o analyzer Vapp is handy for that. You can get nice graphs of i/o's and latencies with that.

esxi dmesg shows scsi errors/aborts max retries etc in fcnic.

Hi Mikko,

yep same setup here - i have VM_RR and ALUA 4 set on the VNX. If i disable paths on either side of the fabric makes no difference. I have TAC case im working on with Cisco.

We are lookin at qos - apparently it has to match perfectly end to end or you see this type of behaviour...

Whats your qos set to mikko?

do your vhba's have qos set on them under the service profile?

I suppose what we need is a sample end to end qos from - Service Profile, System QOS on UCS and N5k

can anyone help with this?

Hi All,

I think i have finally resolved my issue. It was as Brian said something to do with the comms from the guest to the vmk for management on the blade.

Enter ICMP Redirects.

I did a sniff on the VM im using for inband management - watching a 3gb file take ages and ages - hunders of icmp redirect packets were coming to the VM. I proceeded to disable icmp redirects on the CORE upsteam switch which is a Nexus 7009:

conf t

int vlan 76 (in band mgmt / default routing)

no ip redirects

As soon as i did this the upload took less than a minute. Thanks Brian and Mikko for getting me thinking!

Hi Nick,

Interesting observation. We have a hsrp IP in catalyst 4500 as a router to the management network with ip redirects on for some reason. I disabled the ip redirects as well and currently I'm doing 30K iops with RR path dual fabric's enabled and no fnic errors on esxi's logs.

I'll have to do some additional testing to see weather this is a coincidence or a fix by enabling ip redirects again. But it does look promising.

As my i/o load is done with i/o generator (http://labs.vmware.com/flings/io-analyzer) management network should have no effect on non routed FCOE network...  It newer even occurred to my mind to check Ethernet network side.

Thanks Nick for a good advice.

regards

Mikko

Hi Mikko,

This issue appear to rear its head again!

im the same as you, see errors in dmesg from the host. Ip redirects i believe was part of this issue from the nexus side however with ALUA 4 and vmware round robin - storage is constantly seeing i/o error.

Hi,

For me everything still works fine. I even enabled the IP redirects back in the routers and still everything works fine when I'm testing with Vmware I/O analyzer workload. My environment is used only for a vmware vcloud director testing so the normal io load is next to nothing with few test vapp's that we deploy.

Could be QoS in UCS, I ran into this when I first implemented FCoE in the lab earlier this year. Make sure packet drop is checked on the platinum class

Here is the blog post I did on this back in April - http://jeremywaldrop.wordpress.com/2013/04/19/cisco-ucs-multihop-fcoe-qos-gotcha/

This relates to this bug - http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCuh72875

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: