NPIV/VMWare - registeres and disappears

Unanswered Question

Hi there


MDS 9120, SW: version 3.3(1c)


I have a question regarding NPIV with VMWare. My ESX has created the virtual WWNs (4 in Total) and during boot up, i do see them registering in the fcns:


----cut---

1983 Jan 26 01:26:18.451113 fcns: vsan 1: Received registration request through MTS; port-id 0x3c0012, objects registered 8a0f

1983 Jan 26 01:26:18.451245 fcns: vsan 1: reg timer started for port-id 0x3c0012

1983 Jan 26 01:26:18.451300 fcns: vsan 1: Created entry for port-id 0x3c0012

1983 Jan 26 01:26:18.451351 fcns: vsan 1: Got Entry for port-id 0x3c0012

1983 Jan 26 01:26:18.451417 fcns: vsan 1: Registered port-name 28:2d:00:0c:29:00:00:03 for port-id 0x3c0012

1983 Jan 26 01:26:18.451483 fcns: vsan 1: Registered node-name 28:2d:00:0c:29:00:00:01 for port-id 0x3c0012

1983 Jan 26 01:26:18.451535 fcns: vsan 1: Registered cos c for port-id 0x3c0012

1983 Jan 26 01:26:18.451587 fcns: vsan 1: Registered port-type 1 for port-id 0x3c0012

1983 Jan 26 01:26:18.451673 fcns: vsan 1: Sending notification to other modules; port-id 0x3c0012, event 0, modified objects 8a0e

1983 Jan 26 01:26:18.451862 fcns: vsan 1: Reading configuration for entry with port-name 28:2d:00:0c:29:00:00:03, node-name 28:2d:00:0c:29:00:00:01

1983 Jan 26 01:26:18.451926 fcns: vsan 1: No configuration present for this portname

1983 Jan 26 01:26:18.451974 fcns: vsan 1: No configuration present for this nodename

1983 Jan 26 01:26:18.452031 fcns: vsan 1: Saving new entry into pss

1983 Jan 26 01:26:18.452192 fcns: vsan 1: Sending sync message to the standby

1983 Jan 26 01:26:18.452241 fcns: vsan 1: Sending sync message to the standby

1983 Jan 26 01:26:18.452318 fcns: vsan 1: Saving new entry into ext db pss

1983 Jan 26 01:26:18.452418 fcns: vsan 1: Sending ext sync message to the standby

1983 Jan 26 01:26:18.452467 fcns: vsan 1: Sending ext sync message to the standby

1983 Jan 26 01:26:18.462634 fcns: vsan 1: fc_ct_parse_frame() succeeded - request_id = 52778, flags = 0

1983 Jan 26 01:26:18.462858 fcns: vsan 1: received registration command NS_CMD_RFF_ID

1983 Jan 26 01:26:18.462935 fcns: vsan 1: Registered fc4_feature 0 for fc4_type scsi-fcp for port-id 0x3c0012

1983 Jan 26 01:26:18.463042 fcns: vsan 1: Sending RSCN_CHANGED_NS_OBJ with RSCN_PORT_ADDR format for port-id 0x3c0012

1983 Jan 26 01:26:18.463108 fcns: vsan 1: stopping reg timer. coalesce timer started. postponing sending the rscn

1983 Jan 26 01:26:18.463166 fcns: vsan 1: Sending notification to other modules; port-id 0x3c0012, event 2, modified objects 4000

1983 Jan 26 01:26:18.463360 fcns: vsan 1: Saving modified entry into pss

1983 Jan 26 01:26:18.463470 fcns: vsan 1: Sending sync message to the standby

1983 Jan 26 01:26:18.463519 fcns: vsan 1: Sending sync message to the standby

1983 Jan 26 01:26:18.463597 fcns: vsan 1: sending accept response to port-id 0x3c0012

---cut---



but a bit later, I see:

---cut---

1983 Jan 26 01:26:24.454628 fcns: vsan 1: Received deregistration request through MTS; num ports to be deregistered 1

1983 Jan 26 01:26:24.454779 fcns: vsan 1: Deleted entry for port-id 0x3c0012

1983 Jan 26 01:26:24.454845 fcns: vsan 1: Sending notification to other modules; port-id 0x3c0012, event 1, modified objects 0

1983 Jan 26 01:26:24.455014 fcns: vsan 1: Deleting entry from pss

---cut---


What does "MTS" mean? Why is is de-registering?

I can#t see this fcid or WWN in the NS Database at all.


any help is highly appreciated.


Andre



  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Michael Brown Fri, 08/29/2008 - 06:03

Andre,


Can you run this command and re connect the ESX server and post the file for review? This is run from config mode.


fcanalyzer local limit-captured-frames 0 write volatile:fcns.cap


One the problem is observed, hit Ctlr + C to stop the data collection.


Now from the command line, while not in config mode:


sw3# cd volatile:

sw3# dir

1084 Aug 27 18:21:31 2008 fcns_00001_20080827182119.cap


Copy the file found in volatile off via FTP or TFTP and attach it here.


This will collect a trace, readable with Wireshark, where we can see the ESX server interaction with the name server.


Thanks,

Mike

Michael Brown Thu, 02/26/2009 - 18:16

Hi Andre,


Thought I answer this...but apparently not. I looked at the Wireshark trace, and filtered for the FCID 3c.00.12, and it looks to be using the same PWWN that the VM was using in the event messages you posted. I'm no VM expert, but it looks to be the same VM, since the PWWN is the same.


What we see is the VM FLOGI in, register with the FCNS (name server) and registers itself as supporting FCP (SCSI over FC).


It then gueries the NS for other devices that also can do FCP. The answer is about 17 other devices. These devices must be in the same zone as this VM.


About 6 seconds later, the VM sends in a LOGO to logout of the fabric.


This process repeats with the FLOGI, the register, the query and answer, then the logout about 6 seconds after the login.


If you view the trace, the frames of interest are:


376, answer to the FLOGI from this VM

394, the register from this VM

396, the query for other FCP devices

398, the response with members in the zone

493, the logout from this VM to the fabric controller.


Sequence repeats and ends again with frame 671 where the VM logs out again.


I don't see anything in the trace to tell us why the VM is logging out. There are some RJTs to some queries, but they are normal. Basically the VM is asking the name sever for something about another device in the zone, but that device has not registered the information that the VM is looking for. The expected response in that case is a RJT.


The messages you see in the log are the result of the VM logging out of the fabric. The de-register is the internal clean up of the data structures that were created when the VM logged in.


Sorry for not responding sooner. Somehow we need to figure out why the VM is logging out of the fabric.


Hope this helps,

Mike





sidbartle Thu, 02/26/2009 - 18:24

Hi all,

I would just like to add I had the same problem with ESX and iSCSI using a MDS 9216. The ESX server logged in each time then logged it's self out, I never resolved the problem. Put it down to cisco's way of doing iSCSI as every other iSCSI device I have used works fine out of the box. Now using a Dell EqualLogic PE5000.


Sid

Michael Brown Fri, 02/27/2009 - 06:18

Good info Sid, it might help isolate the root cause. In the trace I looked at for Andre, the PWWN of the VM logging is the virtual PWWN of the VM system. I know this because if it were a Cisco generated virtual PWWN it could contain a Cisco OUI. For iSCSI, the MDS generates the Fibre Channel porrion of the session and it uses a Cisco created PWWN.


The PWWN here is 28:2d:00:0c:29:00:00:03 and the OUI is 000c29 which is registered to VMWare.


It would be interesting to see if the VM machine did any PLOGIs to the other devices in the zone with it. Unfortunately the trace tools we have on the MDS limit frame capture to frames to and from the MDS. Transient frames (VM machine to other FC devices) are not routeed through the supervisor, there fore we can not trace them.


One possibility is that the VM machine learns all the devices in the same zone, and when it tries to PLOGI into them, it get's rejected. The result is that the VM machine then decides to log out. We know it is sending the query to the FCNS and that the MDS is responding with the member of the zone. We see ACK to the response, so we know that the VM machine saw the response.


Just can explain why it decides to logout.


Thanks,

Mike

inch Wed, 03/04/2009 - 13:40

Hi Sid,


Interesting that you had troubles with the IPS in the 9216. Can you remember what version of sanos you used at the time? There was some issues in the early days (1.x days) but I have been using the iSCSI on MDS for quite a while.


One quite painful thing that needed to be done if you had lun masking/security setup was to setup static wwpn bindings and then obviously perform your lun masking etc.


We were actually quite lucky enough to have a iscsi lun which we ran our desktop off via a bootable cd (before the days of usb flash!).





suntzzu Wed, 03/04/2009 - 11:11

excellent article that should help you out.


http://www.vmware.com/files/pdf/VMware_Emulex_Best_Practices_for_Virtual_HBA_V4.pdf




Make sure all zoning and unmasking have been completed.


you will need a zone for each physical HBA of the ESX to each port of the array.


The luns must be unmasked to the ESX host.


you will also need a zone for each virtual pWWN of the VM to each port of the array.


the best way to determine which pWWN is showing up on which switch is to shutdown/startup the VM. during the bootup process of the VM run the following:


Switch# sh fcns database


you will then see the virtual pWWN of the VM for a limited time until ESX causes the virtual HBA to de-register due to not being able to view any unmasked LUN's.


once you have the correct virtual pWWN create a zone by adding the port for the array and then manually adding the WWN of the virtual pWWN into the zone (ie. right click in FM on the zone, choose insert, leave WWN selected, enter WWN 11:11:22:22:33:33:44:44)


add the zone to the zoneset, and activate it.



Next step is to get the VM registered with the array.


not sure what type of array you are trying to access but with an EMC CX3 you need to manually create a new initiator. the initiator will consist of the Node WWN and port WWN's found from the earlier steps.


example:


Node WWN (generated from ESX NPiV)


AA:BB:CC:DD:EE:FF:11:22


virtual pWWN (generated from ESX NPiV)


11:11:22:22:33:33:44:44


manually register a new initiator:


AA:BB:CC:DD:EE:FF:11:22:11:11:22:22:33:33:44:44


I would shutdown/startup the VM again once all manual initiators have been created. you should then be able to see whether or not the VM is showing up logged in and registered to the array.


also run the following command from the ESX host to see if there are any virtual ports under the ' Vports list on this physical port:' after the VM has finished rebooting.


cat /proc/scsi/lpfc/1 --> or 2


after it is all said an done, unmask any luns that the VM should have access to.


shutdown/startup the VM and it should then have access to the LUN through the vports.



Actions

This Discussion

 

 

Trending Topics: Storage Networking