Welcome to this Cisco Support Community Ask the Expert conversation. This is an opportunity to learn and ask questions about Cisco UCS Troubleshooting Boot from SAN with FC and iSCSI with Vishal Mehta and Manuel Velasco.
The current industry trend is to use SAN (FC/FCoE/iSCSI) for booting operating systems instead of using local storage.
Boot from SAN offers many benefits, including:
Cisco UCS takes away much of the complexity with its service profiles and associated boot policies to make boot from SAN deployment an easy task.
Vishal Mehta is a customer support engineer for Cisco’s Data Center Server Virtualization TAC team based in San Jose, California. He has been working in the TAC for the past three years with a primary focus on data center technologies such as Cisco Nexus 5000, Cisco UCS, Cisco Nexus 1000v, and virtualization. He has presented at Cisco Live in Orlando 2013 and will present at Cisco Live Milan 2014 (BRKCOM-3003, BRKDCT-3444, and LABDCT-2333). He holds a master’s degree from Rutgers University in electrical and computer engineering and has CCIE certification (number 37139) in routing and switching and service provider.
Manuel Velasco is a customer support engineer for Cisco’s Data Center Server Virtualization TAC team based in San Jose, California. He has been working in the TAC for the past three years with a primary focus on data center technologies such as Cisco UCS, Cisco Nexus 1000v, and virtualization. Manuel holds a master’s degree in electrical engineering from California Polytechnic State University (Cal Poly) and VMware VCP and CCNA certifications.
Remember to use the rating system to let Vishal and Manuel know if you have received an adequate response.
Because of the volume expected during this event, our experts might not be able to answer every question. Remember that you can continue the conversation in the Data Center community, under subcommunity Unified Computing, shortly after the event. This event lasts through April 25, 2014. Visit this forum often to view responses to your questions and the questions of other Cisco Support Community members.
Below are the summarized tasks to configure Boot-From-SAN (using FC/FCoE/iSCSI)
1. UCS Manager Tasks
A. Create a Service Profile Template with x number of vHBAs or iSCSI vNICs.
B. Create a Boot Policy that includes SAN Boot as the first device and link it to the Template
C. Create x number of Service Profiles from the Template
D. Use Server Pools, or associate servers to the profiles
E. Let all servers attempt to boot and sit at the “Non-System Disk” style message that UCS servers return
2. Switch Tasks
A. Zone the server WWPN to a zone that includes the storage array controller’s WWPN.
B. Zone the second fabric switch as well. Note: For some operating systems (Windows for sure), you need to zone just a single path during OS installation so consider this step optional.
3. Array Tasks
A. On the array, create a LUN and allow the server WWPNs for FC or the initiator IQN for iSCSI to have access to the LUN.
B. Present the LUN to the host using a desired LUN number (typically zero, but this step is optional and not available on all array models)
Thank you for asking this question. Most common TAC cases that we have seen on Boot-from-SAN failures are due to misconfiguration.
So our methodology is to verify configuration and troubleshoot from server to storage switches to storage array.
Before diving into troubleshooting, make sure there is clear understanding of this topology. This is very vital with any troubleshooting scenario. Know what devices you have and how they are connected, how many paths are connected, Switch/NPV mode and so on.
Always try to troubleshoot one path at a time and verify that the setup is in complaint with the SW/HW interop matrix tested by Cisco.
Step 1: Check at server
a. make sure to have uniform firmware version across all components of UCS
b. Verify if VSAN is created and FC uplinks are configured correctly. VSANs/FCoE-vlan should be unique per fabric
c. Verify at service profile level for configuration of vHBAs - vHBA per Fabric should have unique VSAN number
Note down the WWPN of your vhba. This will be needed in step 2 for zoning on the SAN switch and step 3 for LUN masking on the storage array.
d. verify if Boot Policy of the service profile is configured to Boot From SAN - the Boot Order and its parameters such as Lun ID and WWN are extremely important
e. finally at UCS CLI - verify the flogi of vHBAs (for NPV mode, command is (from nxos) – show npv flogi-table)
Step 2: Check at Storage Switch
a. Verify the mode (by default UCS is in FC end-host mode, so storage switch has to be in NPIV mode; unless UCS is in FC Switch mode)
b. Verify the switch port connecting to UCS is UP as an F-Port and is configured for correct VSAN
c. Check if both the initiator (Server) and the target (Storage) are logged into the fabric switch (command for MDS/N5k - show flogi database vsan X)
d. Once confirmed that initiator and target devices are logged into the fabric, query the name server to see if they have registered themselves correctly. (command - show fcns database vsan X)
e. Most important configuration to check on Storage Switch is the zoning
Zoning is basically access control for our initiator to targets. Most common design is to configure one zone per initiator and target.
Zoning will require you to configure a zone, put that zone into your current zonset, then ACTIVATE it. (command - show zoneset active vsan X)
Step 3: Check at Storage Array
When the Storage array logs into the SAN fabric, it queries the name server to see which devices it can communicate.
LUN masking is crucial step on Storage Array which gives particular host (server) access to specific LUN
Assuming that both the storage and initiator have FLOGI’d into the fabric and the zoning is correct (as per Step 1 & 2)
Following needs to be verified at Storage Array level
a. Are the wwpn of the initiators (vhba of the hosts) visible on the storage array?
b. If above is yes then Is LUN Masking applied?
c. What LUN number is presented to the host - this is the number that we see in Lun ID on the 'Boot Order' of Step 1
Below document has details and troubleshooting outputs:
Hope this answers your question.
Thanks for that very detailed answer, Vishal. Adding on to my question, could you also provide some common SAN Boot Failure Scenarios?
Common cases we have seen with SAN Boot failures are mostly related with mis-configuration.
If correct order of configuration is followed than failures can be avoided.
However below are the common failure cases we have seen in TAC cases:
1. Incorrect Target WWNs specified in the Service Profile
2. Zoning mis-configured on SAN switches
3. LUN Masking incorrect on Storage Arrays
4. Boot order in boot policy in Service Profile set incorrectly
5. VSAN/FCoE-VLAN misconfiguration
6. Association of FC uplinks to correct VSAN
7. FC ports across should be in correct mode (F, NP, N, E modes)
8. Not using the correct OS drivers.
1. Incorrect target IQN and/or IP
2. On the storage side, LUN assignment to the incorrect initiator IQN and/or IP
3. LUN masking issues
4. Not making the iSCSI vnic vlan as native
5. Configuring the wrong vlan
6. Not allowing the correct vlan on the upstream switches.
7. Not using the correct OS drivers.
Let me know if we need to add further ?
I have Cisco UCS Manager - 2.1(3a) and EMC VNX 5300 which direct attached to Fabric Interconnect A and B through FCOE I configured Local Zoning and I have two Servers
Server 1 has Local Disk and boot from and work fine
Server 2 Configured to boot from SAN the problem every Initiator can see the target two times in FC Zones
I use Vmware 5.0
UNIV-FI-A-A(nxos)# show zones
zoneset name ucs-UNIV-FI-A-vsan-200-zoneset vsan 200
zone name ucs_UNIV-FI-A_A_4_E1-B2-SP_E1-B2_vHBA-A vsan 200
zone name ucs_UNIV-FI-A_A_5_E1-B2-SP_E1-B2_vHBA-A vsan 200
zone name ucs_UNIV-FI-A_A_8_E1-B2-SP_E1-B2_vHBA-A vsan 200
zone name ucs_UNIV-FI-A_A_7_E1-B2-SP_E1-B2_vHBA-A vsan 200
UNIV-FI-A-B(nxos)# show zone
zone name ucs_UNIV-FI-A_B_4_E1-B2-SP_E1-B2_vHBA-B vsan 201
zone name ucs_UNIV-FI-A_B_3_E1-B2-SP_E1-B2_vHBA-B vsan 201
no zone name ucs_UNIV-FI-A_B_8_E1-B2-SP_E1-B2_vHBA-B vsan 201
zone name ucs_UNIV-FI-A_B_7_E1-B2-SP_E1-B2_vHBA-B vsan 201
The reason you see two sets of zones for the same targets and initiator is because when you enable zoning on UCS and you assign a boot from SAN boot order policy, the system will automatically create a set of zones for your vHBAs with the targets associated to that policy . In other words, if you know you want your servers to boot from SAN the only thing you need to do is to assign a boot from SAN boot order policy to service profile and the require zone will be created.
If you want to test this on your servers, add an additional test target wwpn to your boot from SAN policy and you will see that a zone for this new target is created.
Let me know if this make sense or if you have any questions.