FCoE Multihop Performance Issue

Answered Question
Dec 20th, 2012
User Badges:
  • Silver, 250 points or more

Has anyone out there implemented FCoE multihop on the new 2.1 firmware? I moved my lab UCS over to it and storage performance tanked. Here is what I have in my lab.


2 Nexus 5500s - used for both LAN and FC switching

EMC CX4-120 - connected to the 2 Nexus 5500s

UCS 6120/FEX 2104 - Pre-FCoE setup was a 2 port SAN port-channel in each fabric to the Nexus 5ks.

6 B200 M1 servers

ESXi 5


I cabled up an addition twinax on Fabric Interconnect A to Nexus 5548 in FC Fabric A

I cabled up an addition twinax on Fabric Interconnect B to Nexus 5548 in FC Fabric B


Configured the interfaces as FCoE uplink in UCS

Placed the FCoE interfaces in the appropriate VSAN


One fabric at a time I disabled the FC SAN port-channel to force the vHBAs to login over the FCoE uplink. Didn't have any issues with this and vHBAs showed up in the flogi table of the Nexus 5ks on the correct vfc interface.


Rebooted my ESXi hosts (boot from SAN) and the hosts rebooted fine using the FCoE uplinks to the Nexus 5ks.


When I powered on a VM it took over 20 minutes for it to boot and it never got to the point where it was usable.


No drops or errors on the FCoE or vfc interfaces of the Fabric Interconnects and Nexus 5ks.


As soon as I re-enabled the FC SAN port-channels using 4GB SFPs and disable the FCoE uplink performance was fine again.


I am hoping this is something to do with the older gen 1 hardware of 6100s and IOM 2100s and not a bug.


I have 2 other HP rack mount servers with Qlogic CNAs doing FCoE to the same VMFS LUNs with no issues.

Correct Answer by Adi Arslanagic about 4 years 4 months ago

Case ID: 624922297


Both Nexus switches and UCSM tech support are attached.

Endorsed by Robert Burns
Jeremy Waldrop about 4 years 2 months ago

I figured out my performance issue with using multi-hop FCoE from UCS to Nexus 5k.


We have our UCS QoS System Classes changed from the default of best effort 50 and FC 50. We are using the other classes to place some barriers around traffic like vMotion and to give VM network traffic and NFS traffic guaranteed percentages.



On the Nexus 5k side we had the default QoS of 50/50 configured.



To test I reset UCS back to the defaults and then my performance issue went away. The miss-match in QoS between UCS and Nexus 5k was causing the issue.


I then went back and enabled one QoS system class at a time to see which one was causing the issue. When I enabled the Platinum system class I immediatly started noticing a performance issue access storage. The platinum class was being used for IP storage and has no-drop configured.


I can't find anywhere that this is documented but it looks like having no-drop configured on 2 different qos groups at the same time it causes issues.


Here is a screen shot of the QoS system class configuration that works with FCoE


vMotion is mapped to Best Effort

ESXi Mgmt is mapped to Silver

VM traffic is mapped to Gold


I don't really need a QoS policy or vNICs for IP Storage but if I did I would enable packet drop on Platinum and map the NFS/iSCSI vNICs to Platinum.


  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (6 ratings)
Loading.
vek64gware Thu, 12/20/2012 - 18:51
User Badges:

I see some perfomance degradation also, when i switch to FCoE path to Netapp.

But not that drastic as yours.


I have 3 years old chassis with 200 gen 1 and old adapter in it.

Then - 2104, 6120 and 5010 upstream.

Netapp with FCoe cards is connected to 5010.

Cables are twinax.


I have M200gen3 with VIC1280 to play with.

Will try to see if it's gioing to do better.

Jeremy Waldrop Fri, 12/21/2012 - 03:49
User Badges:
  • Silver, 250 points or more

So it sounds like there could be an issue with GEN 1 hardware and FCoE multihop. Anyone out there have GEN 2 hardware they can test this with?

Hello Jeremy,


we are ready to test multihop FCoE with gen2 card (M81KR, VIC1240, IOM2104, FI 6248UP, Nexus5548UP, Win2008 on bare metal, NetApp storage, UCS2.11a).


But first, please, can you explain to me exactly where/how I can configure FCoE northbound ?


I have vPC from both FIs to a pair of N5Ks and of course separate native FC link between each pair of FI and N5K.


With Multihop FCoE feature introduced in 2.1 version – is there any way  to get rid of this native FC link and have FCoE northbound (FI to N5K)  USING ONE LINK OF THE EXISTING vPC?


From the nexus side, it allows me to have one more vlan in one link southbound of a vPC for vsan traffic. But it seems, that on FI I cannot have unified ports in portchannel (vPC) northbound to N5K (or exactly - one unified port straight from "left" FI to "left" N5K and uplink port from "left" FI to "right" N5K and both ports on FI in portchannel - vPC from N5Ks perspective).


Obviously, another way is to add separate ethernet links (between FI and  N5K) and make them unified northbound ... but I still believe I am doing some mistake and FCoE northbound should work in existing vPC to N5K.


Thank you for any comments.
Regards,
Pavel

Jeremy Waldrop Thu, 01/24/2013 - 13:50
User Badges:
  • Silver, 250 points or more

Yes, you can configure the existing uplink port as a Unified Port by also making it an FCoE uplink port. You could do this to one link on each FI and then map the appropriate VSAN. To migrate admin down the regular FC uplinks one fabric at a time and the server vHBA will perform a flogi across the FCoE uplink.


I have tested this and have tested it using a dedicate FCoE uplink, both tests resulted in very poor SAN peformance in my lab. Not sure if it is the gen 1 hardware or something else.

Adi Arslanagic Thu, 02/21/2013 - 01:29
User Badges:

Hello,


I've run into the same problem and it is almost identical configuration:

2x Nexus 5548UP

2x UCS 6248UP

Cisco VIC1240 adapters

2x 8Gb native FC uplinks (storage also connected via 2x 8Gb native FC to both nexus switches)


After switching from native FC Uplinks to FCoE Uplinks (2x 10G twinax port-channeled) storage performance is severly degraded, unusable.

Also I see no errors or drops on fcoe ethernet or vfc interfaces, but there are packets discarded on input:


nexus-sw1# show queuing interface ethernet 1/5

Ethernet1/5 queuing information:

  TX Queuing

    qos-group  sched-type  oper-bandwidth

        0       WRR             50

        1       WRR             50



  RX Queuing

    qos-group 0

    q-size: 360960, HW MTU: 9216 (9216 configured)

    drop-type: drop, xon: 0, xoff: 360960

    Statistics:

        Pkts received over the port             : 0

        Ucast pkts sent to the cross-bar        : 0

        Mcast pkts sent to the cross-bar        : 0

        Ucast pkts received from the cross-bar  : 0

        Pkts sent to the port                   : 0

        Pkts discarded on ingress               : 0

        Per-priority-pause status               : Rx (Inactive), Tx (Inactive)



    qos-group 1

    q-size: 79360, HW MTU: 2158 (2158 configured)

    drop-type: no-drop, xon: 20480, xoff: 40320

    Statistics:

        Pkts received over the port             : 809739

        Ucast pkts sent to the cross-bar        : 743529

        Mcast pkts sent to the cross-bar        : 0

        Ucast pkts received from the cross-bar  : 67599

        Pkts sent to the port                   : 67599

        Pkts discarded on ingress               : 66210

        Per-priority-pause status               : Rx (Inactive), Tx (Inactive)



  Total Multicast crossbar statistics:

    Mcast pkts received from the cross-bar      : 0


So I would rule out the gen 1 hardware problem.. have you managed to make it work?


Best Regards,

Adi


Jeremy Waldrop Thu, 02/21/2013 - 07:26
User Badges:
  • Silver, 250 points or more

No, I haven't been able to make it work so that it is usable. I can get my ESXi host to boot from SAN over FCOE but when I power on a VM it takes 20 minutes to boot it.

dfilenko Fri, 02/22/2013 - 11:56
User Badges:
  • Cisco Employee,

When configuring FCoE from UCS to N5K, vPC can not be used for FCoE traffic. Also cos markings for FCoE traffic should match between N5K and UCS.

Adi Arslanagic Fri, 02/22/2013 - 12:44
User Badges:

Hi Dmitri,


in my case vPC is used, but only for LAN traffic. There is a separate port-channel between nexus and FI only for FCoE SAN traffic, and only allowed vlan on the link (on the nexus) is 981 which maps to vsan 10 (SAN A).


Regards,

Adi

dfilenko Fri, 02/22/2013 - 12:47
User Badges:
  • Cisco Employee,

In this case, are you using uplink manager to create disjoined L2 configuration on UCS side? By default, all UCS uplinks allowing all vlans, so it's possible for vfc to get mapped to vPC port channel.

Adi Arslanagic Fri, 02/22/2013 - 13:21
User Badges:

I have configured Ethernet 1/5-6 on both FIs as FCoE uplink interfaces, then on the SAN tab added them to FCoE Port Channel and assigned to VSAN 10 and 20 respectfully.


Here is the show run int po201 from FI-A nxos:


!Command: show running-config interface port-channel201

!Time: Fri Feb 22 21:57:34 2013



version 5.0(3)N2(2.11a)



interface port-channel201

  description C: FcoeUplink

  shutdown

  switchport mode trunk

  pinning border

  switchport trunk native vlan 4049

  switchport trunk allowed vlan 981-982,4048-4049

  speed 10000

dfilenko Fri, 02/22/2013 - 13:55
User Badges:
  • Cisco Employee,

Adi,


Can you please provide output of:


From UCS:

(nxos)# show vifs interface port-channel201

(nxos)# show running-config interface vfc

(nxos)# show int vfc


From UCS and Nexus:

(nxos)# show class-map type qos

(nxos)# show policy-map type qos


(nxos)# show class-map type queuing


(nxos)# show class-map type network-qos

(nxos)# show policy-map type network-qos

Adi Arslanagic Fri, 02/22/2013 - 14:18
User Badges:

Dmitri,


here is the output


UCS show commands output:



DC-eVlada-A(nxos)# show vifs interface port-channel 201



Interface      MAX-VIFS VIFS

-------------- -------- --------------------------------------------------------

Po201          0        vfc821,

DC-eVlada-A(nxos)# show running-config interface vfc821



!Command: show running-config interface vfc821

!Time: Fri Feb 22 22:51:21 2013



version 5.0(3)N2(2.11a)



interface vfc821

  bind interface port-channel201

  switchport mode NP



DC-eVlada-A(nxos)# sh int vfc821

vfc821 is down (Administratively down)

    Bound interface is port-channel201

    Hardware is Virtual Fibre Channel

    Port WWN is 23:34:54:7f:ee:ab:d1:3f

    Admin port mode is NP, trunk mode is on

    snmp link state traps are enabled

    Port vsan is 10

    1 minute input rate 0 bits/sec, 0 bytes/sec, 0 frames/sec

    1 minute output rate 0 bits/sec, 0 bytes/sec, 0 frames/sec

      20995367 frames input, 34899127788 bytes

        0 discards, 0 errors

      9188891 frames output, 14893075816 bytes

        0 discards, 0 errors

    last clearing of "show interface" counters never

    Interface last changed at Wed Feb 20 19:42:34 2013





DC-eVlada-A(nxos)# show class-map type qos





  Type qos class-maps

  ===================



    class-map type qos match-any class-fcoe

      match cos 3



    class-map type qos match-any class-default

      match any



    class-map type qos match-any class-all-flood

      match all flood



    class-map type qos match-any class-ip-multicast

      match ip multicast



DC-eVlada-A(nxos)# show policy-map type qos





  Type qos policy-maps

  ====================



  policy-map type qos default-in-policy

    class type qos class-default

      set qos-group 0

  policy-map type qos system_qos_policy

    class type qos class-fcoe

      set qos-group 1

    class type qos class-default

      set qos-group 0

  policy-map type qos fcoe-default-in-policy

    class type qos class-fcoe

      set qos-group 1

    class type qos class-default

      set qos-group 0

DC-eVlada-A(nxos)#

DC-eVlada-A(nxos)# show class-map type queuing





  Type queuing class-maps

  =======================



    class-map type queuing class-fcoe

      match qos-group 1



    class-map type queuing class-default

      match qos-group 0



    class-map type queuing class-all-flood

      match qos-group 2



    class-map type queuing class-ip-multicast

      match qos-group 2



DC-eVlada-A(nxos)#

DC-eVlada-A(nxos)# show class-map type network-qos





  Type network-qos class-maps

  ==============================



    class-map type network-qos class-fcoe

      match qos-group 1



    class-map type network-qos class-default

      match qos-group 0



    class-map type network-qos class-all-flood

      match qos-group 2



    class-map type network-qos class-ip-multicast

      match qos-group 2



DC-eVlada-A(nxos)# show policy-map type network-qos





  Type network-qos policy-maps

  ===============================



  policy-map type network-qos system_nq_policy

    class type network-qos class-fcoe



      pause no-drop

      mtu 2158

    class type network-qos class-default



      pause drop

      mtu 9000

  policy-map type network-qos default-nq-policy

    class type network-qos class-default



      pause drop

      mtu 1500

      multicast-optimize

  policy-map type network-qos fcoe-default-nq-policy

    class type network-qos class-fcoe



      pause no-drop

      mtu 2158

    class type network-qos class-default



      pause drop

      mtu 1500

      multicast-optimize

DC-eVlada-A(nxos)#



Nexus show commands output:



DC-nexus-sw1# show class-map type qos





  Type qos class-maps

  ===================



    class-map type qos match-any class-fcoe

      match cos 3



    class-map type qos match-any class-default

      match any



    class-map type qos match-any class-all-flood

      match all flood



    class-map type qos match-any class-ip-multicast

      match ip multicast



DC-nexus-sw1# show policy-map type qos





  Type qos policy-maps

  ====================



  policy-map type qos default-in-policy

    class type qos class-default

      set qos-group 0

  policy-map type qos fcoe-default-in-policy

    class type qos class-fcoe

      set qos-group 1

    class type qos class-default

      set qos-group 0

DC-nexus-sw1#

DC-nexus-sw1# show class-map type queuing





  Type queuing class-maps

  =======================



    class-map type queuing class-fcoe

      match qos-group 1



    class-map type queuing class-default

      match qos-group 0



    class-map type queuing class-all-flood

      match qos-group 2



    class-map type queuing class-ip-multicast

      match qos-group 2



DC-nexus-sw1#

DC-nexus-sw1# show class-map type network-qos





  Type network-qos class-maps

  ==============================



    class-map type network-qos class-fcoe

      match qos-group 1



    class-map type network-qos class-default

      match qos-group 0



    class-map type network-qos class-all-flood

      match qos-group 2



    class-map type network-qos class-ip-multicast

      match qos-group 2



DC-nexus-sw1# show policy-map type network-qos





  Type network-qos policy-maps

  ===============================



  policy-map type network-qos jumbo-frames

    class type network-qos class-fcoe



      pause no-drop

      mtu 2158

    class type network-qos class-default



      mtu 9216

      multicast-optimize

  policy-map type network-qos default-nq-policy

    class type network-qos class-default



      mtu 1500

      multicast-optimize

  policy-map type network-qos fcoe-default-nq-policy

    class type network-qos class-fcoe



      pause no-drop

      mtu 2158

    class type network-qos class-default



      mtu 1500

      multicast-optimize

DC-nexus-sw1#

dfilenko Fri, 02/22/2013 - 14:32
User Badges:
  • Cisco Employee,

This looks correct. Please open case with TAC to find out reason for packet drops on N5K interfaces.

dfilenko Fri, 02/22/2013 - 14:36
User Badges:
  • Cisco Employee,

Do you have case number handy? Does it have N5K and UCSM techsupports attached?

Correct Answer
Adi Arslanagic Fri, 02/22/2013 - 14:38
User Badges:

Case ID: 624922297


Both Nexus switches and UCSM tech support are attached.

dfilenko Fri, 02/22/2013 - 15:01
User Badges:
  • Cisco Employee,

Thank Adi. I will take a look if I have time.


Can you please also provide output of the following from N5K ?


show platform fwm info asic-errors 0


this output shows reasons for drops, and it would be good to see what number is incrementing when fcoe uplink is enabled.

Adi Arslanagic Sat, 02/23/2013 - 01:00
User Badges:

Thanks for help.


Here is the output:


DC-nexus-sw1# show platform fwm info asic-errors 0

Printing non zero Carmel error registers:

DROP_SRC_VLAN_MBR: res0 = 2845053 res1 = 0 [12]

DROP_INVALID_FCF_BYPASS: res0 = 22 res1 = 0 [24]

DROP_FCF_SW_TBL_MISS: res0 = 470 res1 = 0 [25]

DROP_FCF_FC_SW_SHOULD_NOT_BE_ME: res0 = 80 res1 = 0 [27]

DROP_SHOULD_HAVE_INT_MULTICAST: res0 = 226 res1 = 0 [36]

DROP_VLAN_MASK_TO_NULL: res0 = 488 res1 = 0 [39]

DROP_SRC_MASK_TO_NULL: res0 = 93135 res1 = 0 [44]

DROP_INGRESS_ACL: res0 = 910 res1 = 0 [46]


DC-nexus-sw2# show platform fwm info asic-errors 0

Printing non zero Carmel error registers:

DROP_SRC_VLAN_MBR: res0 = 2915122 res1 = 0 [12]

DROP_INVALID_FCF_BYPASS: res0 = 21 res1 = 0 [24]

DROP_FCF_SW_TBL_MISS: res0 = 376 res1 = 0 [25]

DROP_FCF_FC_SW_SHOULD_NOT_BE_ME: res0 = 81 res1 = 0 [27]

DROP_SHOULD_HAVE_INT_MULTICAST: res0 = 15 res1 = 0 [36]

DROP_VLAN_MASK_TO_NULL: res0 = 487 res1 = 0 [39]

DROP_SRC_MASK_TO_NULL: res0 = 99962 res1 = 0 [44]

DROP_INGRESS_ACL: res0 = 16 res1 = 0 [46]



I can not activate FCoE uplinks since this is production enviroment.

Jeremy Waldrop Wed, 02/27/2013 - 11:14
User Badges:
  • Silver, 250 points or more

Adi, any updates on the TAC case you have open?

Adi Arslanagic Thu, 02/28/2013 - 00:55
User Badges:

Hi Jeremy,


TAC said that we need to turn on the FCoE Uplinks to do troubleshooting but we were unable to since this is now a production enviroment :\.

They also suggested that Cisco HBA driver on the Win2012 server is a bit out of date and we should try to update it and test again with disk performance monitoring but we haven't been able to schedule it yet.


Have you raised a TAC case about this issue?


Regards,

Adi

Jeremy Waldrop Fri, 02/22/2013 - 13:46
User Badges:
  • Silver, 250 points or more

I have 2 dedicated 10G uplinks for FCoE in a SAN Port-channel. Functionally it works fine and my ESXi hosts boot from SAN just fine over FCoE. When I power a single VM on it takes 20 minutes for it to boot and it is unusable, no errors on any interfaces. If I disable the FCoE port channel and enable the traditional FC san-port-channel performance is great.


I have a vPC setup but that is just for Ethernet LAN uplinks.


Here is my Nexus 5k side config for fabric A, Fabric B is simlar except for the VSAN/VLANVFC numbers


In UCS I have VSAN 20 mapped to FCOE VLAN 3020 on A and VSAN 40 is mapped to FCOE VLAN 3020 on Fabric B.


With this config all ESXi vhba initiators flogi over the vfc8 interface on Fabric A and vfc9 interface on Fabric B. I can access my VMFS LUNs and it is all functional just very poor performance.



vlan 3020

  fcoe vsan 20


interface port-channel8

  description UCS Fabric A fcoe e1/31-32

  switchport mode trunk

  switchport trunk allowed vlan 3020

  spanning-tree port type edge trunk

  speed 10000


interface Ethernet1/17

  description UCS Fabric A fcoe e1/31

  switchport mode trunk

  switchport trunk allowed vlan 3020

  spanning-tree port type edge trunk

  channel-group 8 mode active



interface Ethernet1/18

  description UCS Fabric A fcoe e1/32

  switchport mode trunk

  switchport trunk allowed vlan 3020

  spanning-tree port type edge trunk

  channel-group 8 mode active


interface vfc8

   switchport description UCS Fabric FCOE

   bind interface port-channel8

dfilenko Fri, 02/22/2013 - 14:35
User Badges:
  • Cisco Employee,

Jeremy,


I would recommend to open support case with TAC to analyze config. If you have FCoE  drops in the path, it would explain slowness of connectivity.

Jeremy Waldrop Sun, 03/31/2013 - 13:58
User Badges:
  • Silver, 250 points or more

I figured out my performance issue with using multi-hop FCoE from UCS to Nexus 5k.


We have our UCS QoS System Classes changed from the default of best effort 50 and FC 50. We are using the other classes to place some barriers around traffic like vMotion and to give VM network traffic and NFS traffic guaranteed percentages.



On the Nexus 5k side we had the default QoS of 50/50 configured.



To test I reset UCS back to the defaults and then my performance issue went away. The miss-match in QoS between UCS and Nexus 5k was causing the issue.


I then went back and enabled one QoS system class at a time to see which one was causing the issue. When I enabled the Platinum system class I immediatly started noticing a performance issue access storage. The platinum class was being used for IP storage and has no-drop configured.


I can't find anywhere that this is documented but it looks like having no-drop configured on 2 different qos groups at the same time it causes issues.


Here is a screen shot of the QoS system class configuration that works with FCoE


vMotion is mapped to Best Effort

ESXi Mgmt is mapped to Silver

VM traffic is mapped to Gold


I don't really need a QoS policy or vNICs for IP Storage but if I did I would enable packet drop on Platinum and map the NFS/iSCSI vNICs to Platinum.


Robert Burns Mon, 04/01/2013 - 04:59
User Badges:
  • Cisco Employee,

Thanks for the great follow up Jeremy.  Hopefully this helps someone down the road if they encounter a similar issue.


Keep up the great work!


Robert

Adi Arslanagic Tue, 04/02/2013 - 04:24
User Badges:

Great find, thanks for updating the thread Jeremy!


Now that I think about it we also had mismatch between UCS and Nexus qos weights, guess that could have been the problem. We went with the native FC uplinks but this is good to know!


Adi

Ivan Kovacevic Fri, 06/28/2013 - 13:14
User Badges:
  • Cisco Employee,

This explanation is actually a bit misleading.

The issue is actually caused by a miss-match between UCS FI and N5k QoS config. Enabling "Platinum" class on UCS which is by default configured d no-drop class and leaving default N5k QoS config will cause DCBX negotiation to fail on the 10G between FI and N5k. This means that PFC will become disabled on for the "Fibre Channel" class as well, which means FCoE frames will be dropped and pause-frames will not be honored.


The correct recommendation is to keep the QoS config consistent end-to-end on all datacenter switches along the FCoE path.

Actions

This Discussion

Related Content