Poor performance migrating storage between sites

frank.parry · ‎02-02-2014

Infrastructure layout:

Site A:

EMC CX4-480 connected to two MDS 9509

Both 9509's are connected up to a single Nexus 5548 (each 9509 trunks their respective VSAN's to the 5548)

Between sites:

The Nexus 5548 in Site A connects via FCoE to a 5548 in Site B (10 Gb). The sites are separated by approximately 80-100 meters, if I had to guess.

Site B:

The 5548 for site to site connectivity, links up to another two 5548's, acting as MDS device (FC only). Both of these 5548's connect up to a new EMC VNX7600.

The issue:

Experiencing poor performance (and timeouts) when trying to transfer data from Site A to Site B.

I seem to have no issues with zoning, as I'm able to see the presented/masked LUNs from site to site from initiators on either side. My problem always occurs when the initiator is on the "Site A" side, attempting to transfer data to Site B. If I use an initiator on the Site B side to access storage from Site A, I have NO issues.

Example:

If I use EMC's "SAN Copy" in a "Pull" method (using the VNX as the initiator and the CX4 as the target), I can transfer data at the expected speeds. A 200 GB LUN transferred in about 30 minutes.

If I used SAN Copy in "Push" mode (using CX4 as the initator and the VNX as the target), my tranfers peek at 5 Mbps and average between 1-2 Mbps. The same 200 GB LUN took nearly 4 hours to transfer.

Can anyone make sense of why there is such a great difference in performance/bandwidth when the initiators/targets are flipped? I'm a bit of a beginner on the FC and FCoE side, so I'm unsure of what to look at first.

Thanks very much for any assistance.

frank.parry · ‎02-03-2014

I should also mention that all connections are FC, aside from the links between buildings which is FCoE.

The problem is also not specific to SAN Copy (I just used it as an example). We experience the same type of delays/timeouts when trying to create or mount a datastore (VMware), if the host sits in Site A and is using the VNX in Site B. If a host is in Site B and tries to connect back to datastores from Site A, there is no issue accessing storage.

Walter Dey · ‎02-06-2014

This is indeed a strange design, not following best fibrechannel best practise of 2 separate fabrics a and b.

why not connecting MDS site A to N5K site B (fabric a) and same for fabric b. I would anyway not do FCoE, but FC end to end.

your 2 N5k for site interconnectivity are single point of failures, and could in fact also become a bootleneck.

I would check BB credit counter on the trunk links.