cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4144
Views
0
Helpful
14
Replies

vfc interface down until "shut/no shut" on Ethernet interface(s)

9ball
Level 1
Level 1

Hello, I'm hoping someone out there has had this problem and knows the solution, or can provide some insight as to whether the problem is with the CNA, Host OS, or Nexus.  I've not yet posted anything to the Brocade discussion forums about this issue.

I have vfc interfaces that are not coming up naturally; the log states that it fails "waiting for flogi" (see 'google').   The only solution that I've found posted for that problem is to apply the QoS settings for FCoE.  Those settings were already applied during the initial configuration of the switch, and I've since double and triple checked them.  I've also gone through the Cisco Nexus 5000 Troubleshooting Guide, Troubleshooting FCoE Issues, and everything checks out except a couple of things:

1)  The interfaces don't get past the FIP Solicitation stage: They show Triggered Event: [FCOE_MGR_VFC_EV_FIP_SOLICITATION] over and over again.

2)  While the troubleshooting guide states that the FCoE vLAN should be STP forwarding on the underlying Ethernet interfaces, it is not.  However, I have a working configuration elsewhere (with different make/model CNA hardware and Nexus software revisions) and the FCoE vLAN is not STP forwarding on those underlying Ethernet interfaces either.

The interesting part is that if I do a "shut" then "no  shut" on the underlying Ethernet interfaces, after the hosts have  completely booted, the vfc interfaces come up and successfully register in the flogi database.  At that point everything is functional.  If the hosts then reboot, the vfc  interfaces fail to come up until I perform the "shut/no shut" on the underlying Ethernet interfaces.

One other noteworthy symptom is that during POST, when the CNA initializes, it displays the error message "Link initialization failed. Disabling BIOS."

Here is a configuration synopsis:

Cisco Nexus 5548UP [Ver. 5.1(3)N2(1)]

Brocade 1020 CNA [Rev. 3.2.1.0]

ESXi vSphere [Ver 5.1], [CNA Driver Rev. 3.2.1.0]

NEXUS-CNA-NETWORK_13JUN2013.png

There are two vSANs; A-side and B-side.  Each is bound to a unique and dedicated vLAN.

Po1 is the vPC peer-link.  The FCoE vLANs are *not* allowed on it.

Po1011 is DB1's vPC port-channel.  It is an 802.1q Trunk; the native vLAN (1) and the FCoE vLANs are allowed, among other Ethernet data vLANs.

Po1021 is DB1's vPC port-channel.  It is an 802.1q Trunk; the native vLAN (1) and the FCoE vLANs are allowed, among other Ethernet data vLANs.

vfc1011 is DB1's Virtual FC interface.  It is a Trunk (mode TF), allowing the A-side vSAN only.  It is in the vSAN membership database.

vfc1021 is DB2's Virtual FC interface.  It is a Trunk (mode TF), allowing the B-side vSAN only.  It is in the vSAN membership database.

EDIT:  The Port-Channels are LACP.

Thanks for your time.

Message was edited by: Michael Hertrick

14 Replies 14

AJ Cruz
Level 3
Level 3

Can you post the interface configs?

Sent from Cisco Technical Support iPad App

! Nexus Switch 1

!

interface port-channel1021

  switchport mode trunk

  switchport trunk allowed vlan 1,100,801-802

  spanning-tree port type edge trunk

  speed 10000

  vpc 1021

!

interface vfc1021

  bind interface port-channel1021

  switchport trunk allowed vsan 801

  no shutdown

!

interface Ethernet1/26

  switchport mode trunk

  switchport trunk allowed vlan 1,100,801-802

  spanning-tree port type edge trunk

  channel-group 1021 mode active

!

vsan database

  vsan 801 name "VSAN-801"

  vsan 802 name "VSAN-802"

  vsan 801 interface vfc1021

! Nexus Switch 2

interface port-channel1021

  switchport mode trunk

  switchport trunk allowed vlan 1,100,801-802

  spanning-tree port type edge trunk

  speed 10000

  vpc 1021

!

interface vfc1021

  bind interface port-channel1021

  switchport trunk allowed vsan 802

  no shutdown

!

interface Ethernet1/26

  switchport mode trunk

  switchport trunk allowed vlan 1,100,801-802

  spanning-tree port type edge trunk

channel-group 1021 mode active

!

vsan database

  vsan 801 name "VSAN-801"

  vsan 802 name "VSAN-802"

  vsan 802 interface vfc1021

I've only done a couple FCoE deployments, both were FlexPOD deployments. So I won't say that I'm an expert on the subject, but Cisco has a pretty good FlexPOD deployment guid that I followed:
http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/Virtualization/flexpod_deploy.html#

I don't know if it's contributing to your problems, but per the deployment guide you shouldn't have the VSANs on both fabrics. It seems weird, but on the VPC links, the fcoe vlan will be added only to the side it belongs. It feels wrong to have a VPC link that doesn't match on each side, but that's how the fabrics stay separated. The other option is to create dedicated FCoE links separate from the LAN VPC.

Sent from Cisco Technical Support iPad App

Thanks for the input.  I don't think that's contributing to the problem I'm having, as I have a working configuration done as such.  As far as "how the fabrics stay separated," not allowing the vSAN vLANs on any trunk between the switches keeps them separated.  And that's how it is in this configuration, the VPC peer-link does not have the vSAN vLANs on it.  I don't see any risk in having the vLAN configuration as it is, but I will test the removal of the unnecessary vLAN from each side of the vPC anyway in my own lab.

I also wanted to try separating the Ethernet links from the VPC, just to see if that made any difference in the FIP negotiation, but my client has since opened a Cisco TAC case; we don't want to interfere with their process.  Besides, even if that solved the FIP negotiation issue, it would be a workaround to a bug and not really a good solution.  It would essentially defeat the purpose of using FCoE as opposed to native FC.

Thanks again, I'll post updates as I have them.  Let me know if you come up with anything else.

9ball
Level 1
Level 1

Apparently Cisco TAC has checked everything out and determined that the Nexus is operating properly and that the issue is with the Brocade driver/firmware.  The Brocade ticket remains open.  My next reply (hopefully) will be the final resolution to this problem (so there isn't another open-ended issue in the forum that serves no useful purpose).

I'm actually really surprised and curious that Cisco didn't tell you to remove the fcoe vlans from one side of the VPC. What happens when SAN A traffic lands on the B side Nexus?

I agree that the more distance you can put between the fabrics ever converging into one, the better off you are.  You don't want faults on one SAN to interrupt and cause faults on your redundant SAN.

Nevertheless, the aforementioned configuration does not converge the two vSANs. 

The host does not bridge between the two adapters, nor do the switches bridge between the two.  Remember, even though you bind the vfc to the port-channel in the configuration, the vfc is really bound only to the Ethernet interface on the local switch and is not participating in the port-channel at all.  There is no way, with that configuration as-is, for SAN A traffic to land on SAN B.  A series of a few mis-configurations would be required for the SAN segmentation to be eroded. 

http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/configuration_guide_c07-543563.html
See sections "Configuration Parameters That Must Be Identical" and "Configuration Parameters That Should Be Identical"

Under "must" there is:

  • Enable or disable state per VLAN"

Under "should" there is:

  • Spanning Tree Protocol interface settings: 
    • VLANs (Rapid Per-VLAN Spanning Tree Plus [PVST+])

Would you mind pointing out in the document you linked where Cisco recommends that the allowed vLANs on each side of the vPC should be different?  I didn't exactly find that.  What I did see was that they have vlan 101 and 102 allowed on both port-channel11 and port-channel12 in their example.  vsan 101 is bound to vlan 101, vsan 102 is bound to vlan 102.

Kind Regards.

Actually, the link I sent you isn't the document I used when I did my deployments. I didn't bother to look down at the content assuming it was the same.

What is odd is the URL I originally sent you does in fact have both VSANs on the vPC. The PDF I used at the beginning of this year for my deployments does not. Here it is: https://dl.dropboxusercontent.com/u/26091129/flexpod_vsphere_50_M3.pdf

So if you take a look at Po11 of the nexus config in the appendix of that guide, which is the port channel to the NetApp array, you can see that on 5K1 it has only fcoe vlan 101, on 5K2 it has only fcoe vlan 102.

Again, I don't know that it's causing you any troubles maybe not; that's just how I did it based on the design guide, which seems to be different than the current design guide. Good luck, I'm interested to hear what you find.

9ball
Level 1
Level 1

As mentioned earlier, Cisco said the problem appears to be with the Brocade 1020 CNA.  More specifically, Cisco analyzed traces from the interfaces and said the Brocade adapter was (and I'm paraphrasing) sending negotiation frames out of order, "confusing" the Nexus.  Brocade has provided no resolution to the problem. 

Therefore, this problem was resolved by replacing the Brocade 1020 CNA with the QLogic QLE8262 CNA, an adapter we successfully use in other production environments.

...man, I was hoping you weren't going to say that.

Having a nearly identical problem, but can't replace my CNAs.  Did Brocade never provide anything?

My client opened the ticket with Brocade and I haven't heard any news regarding that ticket.  As far as I know, they have not resovled it.  I suggest you open a ticket with Brocade as well, if you haven't already.  It may help Brocade if you can run some Ethernet traces on your switch ports (e.g. port mirror and wireshark) and submit those along with the ticket.

I checked their web site, too, and they have not published any new drivers for the CNA, version 3.2.1.0 is still the latest publicly available:  http://www.brocade.com/services-support/drivers-downloads/adapters/index.page

That said, how is your problem nearly identical?  What's different in your case?

Good luck.

Doh!  Too bad.

My problem is nearly identical in that I only see this behavior from the Boot BIOS.  When booted into a full operating system (Linux, VMware, bcucli CD), everything works fine and I can see all targets.

Unfortuantely, this is a lab environment, so I don't have a support contract - so no ticket

Brocade has a 36 month warranty on hardware/software for CNAs, which includes 24/7 phone & email support and lifetime access to software updates.

http://www.brocade.com/services-support/warranties/index.page

VMware posted 3.2.3.0 version for ESXi 5.1 and 5.5, Brocade still has 3.2.1.0 on their web site.

We are still seeing issues with flogi when testing by physical unplugging. .. Only reboots fix it.

Hosts on Qlogic 8262 CNAs are handling re-logins much better..

Why do Brocade cards suck so much?


Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: