cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
10241
Views
0
Helpful
12
Replies

fcpio_data_cnt_mismatch

We are seeing fcpio_data_cnt_mismatch in our vmkernel.log on one of our 10 esxi hosts running esxi 6.0 update 2.  We are using fnic_drive 1.6.0.28.  We started out with firmware 3.1(1k) and have upgrade firmware to 3.1(2e).   It seems to be very similar to bug CSCva47085, but we are running C240M4s that are UCS Managed.

The VMkernel.log files shows

2017-01-18T17:06:20.086Z cpu3:33174)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60000970000197000380533030303238" state in doubt; requested fast path state update...
2017-01-18T17:06:20.086Z cpu3:33174)ScsiDeviceIO: 2651: Cmd(0x439e01625f00) 0x28, CmdSN 0xb6f from world 33043 to dev "naa.60000970000197000380533030303238" failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2017-01-18T17:06:30.515Z cpu32:33560)<3>fnic : 2 :: hdr status = FCPIO_DATA_CNT_MISMATCH
2017-01-18T17:06:30.515Z cpu7:36674)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60000970000197000380533030303238" state in doubt; requested fast path state update...
2017-01-18T17:06:30.515Z cpu7:36674)ScsiDeviceIO: 2651: Cmd(0x439e17558f80) 0x28, CmdSN 0xb9f from world 36434 to dev "naa.60000970000197000380533030303238" failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2017-01-18T17:06:30.641Z cpu32:33810)<3>fnic : 2 :: hdr status = FCPIO_DATA_CNT_MISMATCH
2017-01-18T17:06:30.641Z cpu7:36674)ScsiDeviceIO: 2651: Cmd(0x439e15726c40) 0x28, CmdSN 0xbe9 from world 36437 to dev "naa.60000970000197000380533030303238" failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2017-01-18T17:06:32.446Z cpu2:37413)VMotionRecv: 659: 1484759080517443 D: Estimated network bandwidth 318.194 MB/s during pre-copy

Looking at the adapter in the UCS shows this

CSCUCSView-A# connect adapter 1/1
adapter 0/1/1 # connect
adapter 0/1/1 (top):2# show-log 100
170119-11:21:20.472158 ecom.ecom_main ecom(4:1): fcpio_data_cnt_mismatch for exch 5c85 status 1 rx_id 364 s_stat 0x3 xmit_recvd 0x55000 burst_offset 0x55000 sgl_err 0x0 last_param 0x54800 last_seq_cnt 0x0 tot_bytes_exp 0x80000 h_seq_cnt 0x29 exch_type 0x0 s_id 0xec0000 d_id 0xec0000 host_tag 0x71
170119-11:21:20.472207 ecom.ecom_main ecom(4:1): fcpio_data_cnt_mismatch for exch 5c85 status 1 rx_id 364 s_stat 0x3 xmit_recvd 0x55000 burst_offset 0x55000 sgl_err 0x0 last_param 0x54800 last_seq_cnt 0x0 tot_bytes_exp 0x80000 h_seq_cnt 0x29 exch_type 0x0 s_id 0xec0000 d_id 0xec0000 host_tag 0x71

We think this is strange that this is only happening on one server out of ten, they are all the same model and using the same service profile, same uplinks.  We do have a TAC case open, 681634353,  but after I gave them the logs I haven't heard anything back.  I am wondering if anyone else has seen something similar.

2 Accepted Solutions

Accepted Solutions

Hey Kevin,

That's good information. I think the next step would be to rule in/out hardware and swap out the VIC adapter. Please share the results after the swap.

-Wes

View solution in original post

I got my replacement VIC in yesterday.  I replaced it, it took a few reacknowledgements for the UCS Manager to see the server again. Since the replacement,  I have not seen the error message in the vmkernel.log.  I tested the card with various VM operations,  lots a vmotions, clone VMs, recompose a few pools,  still clean.   I believe the issue is now resolved.

View solution in original post

12 Replies 12

Wes Austin
Cisco Employee
Cisco Employee

Hello,

This error is typically indicative that the host is receiving frames out of order from the storage array.

Causes:

1. Incorrect FNIC driver

2. Physical Layer issues on the path to storage

I took a look at your case, and both the FCID that are reporting the issue are from your EMC array:

VSAN 4:
--------------------------------------------------------------------------
FCID TYPE PWWN (VENDOR) FC4-TYPE:FEATURE
--------------------------------------------------------------------------
0xec0000 N XXXXXXXXXXXXXXXX (EMC) scsi-fcp:both 253
<output omitted>
0xec0180 N XXXXXXXXXXXXXXXX (EMC) scsi-fcp:both 253

Please engage EMC and see if they can provide insight on why we are getting out of order frames from the array on multiple FCID.

HTH,

Wes

Wes,

Thanks for looking into the case. 

I have cases opened with EMC and with VMware.  Hopefully somebody can found out something.  Cisco TAC advised me to upgrade from the 1.6.0.25 fnic driver delivered in the 6.02 Cisco ISO file to 1.6.0.28 driver.  Servers 2-10 go thru the same interconnect switch to the storage, which makes me think the uplinks from the 6248s are good. I don't see errors in servers 2-10. I am thinking it has is something from the server to the port where it is connected to in the interconnect. Not sure if it is card or cable or something else. I am not seeing any errors on the interconnects on any ports.  Our interconnects are connected directly to the storage array.    Are you only seeing errors on VSAN 4 or was that just an example?

Thanks.

Hey Kevin,

If you suspect a problem with the interfaces on the FI, you could try to re-integrate on different interfaces and see if the problem persists. You could also try to swap the cabling/SFP between the FI and the MLOM. It is possible that the VIC may be faulty, however, I would expect other failure messages in the adapter if this was the case vs just getting frames out of order.

I just checked for the FCID that are reporting out of order frames, and the source and destination FCID is EMC on all accounts.

HTH,

Wes

 I can try on different ports to see if that makes a difference.  Where do you see fcid info?  I have the tar files I collected and just curious.

Hey Kevin,

Thanks for the update. The FCID info is in the messages in the adapter logs:

170119-11:21:20.472158 ecom.ecom_main ecom(4:1): fcpio_data_cnt_mismatch for exch 5c85 status 1 rx_id 364 s_stat 0x3 xmit_recvd 0x55000 burst_offset 0x55000 sgl_err 0x0 last_param 0x54800 last_seq_cnt 0x0 tot_bytes_exp 0x80000 h_seq_cnt 0x29 exch_type 0x0 s_id 0xec0000 d_id 0xec0000 host_tag 0x71
170119-11:21:20.472207 ecom.ecom_main ecom(4:1): fcpio_data_cnt_mismatch for exch 5c85 status 1 rx_id 364 s_stat 0x3 xmit_recvd 0x55000 burst_offset 0x55000 sgl_err 0x0 last_param 0x54800 last_seq_cnt 0x0 tot_bytes_exp 0x80000 h_seq_cnt 0x29 exch_type 0x0 s_id 0xec0000 d_id 0xec0000 host_tag 0x71

s_id = source ID

d_id= dest ID

HTH,

Wes

I tried different ports on the interconnect and replacing the cable but that didn't help either.

I saw this https://quickview.cloudapps.cisco.com/quickview/bug/CSCva47085 and tried those commands.  Still problems only on one side.

 

adapter 0/1/1 # connect

adapter 0/1/1 (top):1# attach-mcp

adapter 0/1/1 (mcp):1# dcem-macstats 0

                                   TOTAL DESCRIPTION

                                     227 Tx frames len == 64

                                 1961961 Tx frames 64 < len <= 127

                                    6518 Tx frames 128 <= len <= 255

                                     1554 Tx frames 256 <= len <= 511

                                   50223 Tx frames 512 <= len <= 1023

                                     7379 Tx frames 1024 <= len <= 1518

                                     5419 Tx frames 1519 <= len <= 2047

                                   326982 Tx frames 2048 <= len <= 4095

                                 2360263 Tx total packets

                               907325360 Tx bytes

                                 2360263 Tx good packets

                                 2358177 Tx unicast frames

                                     1725 Tx multicast frames

                                      361 Tx broadcast frames

                                     166 Tx per-priority pause frames

                                       18 Rx Frames len == 64

                                   544173 Rx Frames 64 < len <= 127

                                  69711 Rx Frames 128 <= len <= 255

                                     8907 Rx Frames 256 <= len <= 511

                                   62079 Rx Frames 512 <= len <= 1023

                                     7674 Rx Frames 1024 <= len <= 1518

                                 13592918 Rx Frames 1519 <= len <= 2047

                                 1320493 Rx Frames 2048 <= len <= 4095

                                       1 Rx Frames 4096 <= len <= 8191

                                15605974 Rx total received packets

                             23684998386 Rx bytes

                                 15604935 Rx good packets

                                 15469793 Rx unicast frames

                                   84034 Rx multicast frames

                                   51108 Rx broadcast frames

                                       1 Rx frames with VLAN tag

                                     1039 Rx CRC error frames not stomped

                                       18 Rx per-priority pause frames

                               907325360 Tx bytes for good packets

                             23683254176 Rx bytes for good packets

               0.000bps                   Tx Rate

               0.000bps                   Rx Rate

adapter 0/1/1 (mcp):2# dcem-macstats 1

                                   TOTAL DESCRIPTION

                                     357 Tx frames len == 64

                                   319628 Tx frames 64 < len <= 127

                                     6744 Tx frames 128 <= len <= 255

                                     2445 Tx frames 256 <= len <= 511

                                   61779 Tx frames 512 <= len <= 1023

                                    8714 Tx frames 1024 <= len <= 1518

                                   41303 Tx frames 1519 <= len <= 2047

                                   325616 Tx frames 2048 <= len <= 4095

                                   766586 Tx total packets

                                836830402 Tx bytes

                                   766586 Tx good packets

                                   764637 Tx unicast frames

                                     1478 Tx multicast frames

                                    471 Tx broadcast frames

                                       88 Tx frames with VLAN tag

                                       14 Tx per-priority pause frames

                                     162 Rx Frames len == 64

                                  542052 Rx Frames 64 < len <= 127

                                   65522 Rx Frames 128 <= len <= 255

                                   13155 Rx Frames 256 <= len <= 511

                                   65764 Rx Frames 512 <= len <= 1023

                                  10361 Rx Frames 1024 <= len <= 1518

                                   67240 Rx Frames 1519 <= len <= 2047

                                 1419167 Rx Frames 2048 <= len <= 4095

                                 2183423 Rx total received packets

                               3231684906 Rx bytes

                                 2183423 Rx good packets

                                 2046554 Rx unicast frames

                                   85079 Rx multicast frames

                                   51790 Rx broadcast frames

                                     120 Rx per-priority pause frames

                               836830402 Tx bytes for good packets

                               3231684906 Rx bytes for good packets

               0.000bps                   Tx Rate

               0.000bps                   Rx Rate

Hey Kevin,

That's good information. I think the next step would be to rule in/out hardware and swap out the VIC adapter. Please share the results after the swap.

-Wes

Thanks,  I will continue to update when the new adapter comes in.

I got my replacement VIC in yesterday.  I replaced it, it took a few reacknowledgements for the UCS Manager to see the server again. Since the replacement,  I have not seen the error message in the vmkernel.log.  I tested the card with various VM operations,  lots a vmotions, clone VMs, recompose a few pools,  still clean.   I believe the issue is now resolved.

I had a similar issue with the same errors, but in my case the issue was use of an incorrect SFP. Specifically, I hadn't noticed that I was using the SFP-10G-SR-S, which differs from the SFP-10G-SR in that it doesn't support FCoE, but still sees the LUNs as devices, just not as datastores.

Another thing I noticed is that the error is only reporting issues with FCID on the B side fabric.

As a test, you can shut down the vHBA on the B side or even the B side connection from the FI to MLOM and see if the errors persist. If they do not, you know there is something wrong on the B side path to the storage.

HTH,

Wes

Wes,

I disabled the B side on Server 1 and the messages stopped.   No problems on Side A. I tried replacing the cable on side B and that didn’t work.  The only thing connected is EMC VMAX and cisco C240M4..  I would think if the interface is bad on the FI I would see errors, which haven't seen yet.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: