Extending SAN - Native FC over DWDM Experiences

Answered Question
Sep 18th, 2009

We are in the process of adding a new Data Center which will be geographically located about 60 miles from our existing Data Center, the solution I'm proposing for continuous SAN access and replication is deploying a DWDM solution (Ciena or Cisco) to be able to extend native Fiber Channel to the new Data Center which will provide for an easier and staged migration. My only concern is latency over DWDM for FC since the distance is about 60 miles, does anyone have a real life experience or have been through this type of deployment and can provide me with some feedback, advice, "what to look for" etc, will the distance be an issue using DWDM? I appreciate your responses, thank you.

I have this problem too.
0 votes
Correct Answer by stephen2615 about 7 years 1 month ago

A bit late but hopefully not too late to be useful.

I have two remote Data Centers thanks to a merger. Both are less than 6 miles by car and we have two paths to each of them. Both the second paths are about 34 miles in a wild ring thanks to a very important customer down south.

Our network engineer who controls the ONS doesn't have a clue about FC and has a bad attitude to boot. IP doesn't have any problems with latency but he had no idea about the SAN and latency.

I found he had somehow or another got 1 out of the four paths to go down the long link and the other 3 were using the short link. Our sync HDS True Copy got very upset about this and I could not work out why. We were getting terrible timeouts of about 300 ms. This bought down a couple of mission critical apps that relied on response times of less than about 20 ms. They would not write to disk at the remote size in enough time. We worked the problem out after a bit of yelling at each other. Now he knows to always use both paths if he fails them over and to actually let me know he is doing something. To be fair, the Brocade switches also sharing the links identified the link distance mismatch.

Another issue was turning on In Order Delivery for the mainframe. That should not be mixed with other general traffic. HDS are also (quitely) adamant that you should not mix sync and async from the same ports on the storage. I had to seperate them so we have one pair for sycn and another pair for async. It looks like a SAN problem but its the different writing patterns for sync and asnyc.

60 miles is too long for HDS sync TC. I believe there is an unwritten law that 32 to 34 miles is the longest to use for sync TC. Async is not a problem. I have not used any FC accelerators. I also have EMC arrays and I have no replication issues with them either.

When I first set up the native FC over ONS, I did a fair bit of measuring and testing. Our Exchange geo cluster using sync TC normally ran at about 5 ms response times down both paths. Thats a write to the remote Data Centre and then a write to the storage on site. Heavy work showed about 10 to 12 ms. MS suggest anything less than 20 ms. 32 miles is about 0.5 of a ms extra for latency so I never actually noticed any difference with either path.

I did a lot of async tc to keep replica's of important systems offsite and once all the fine points were worked out, it was a great solution.

60 miles could be 160 using the other path.

I have not tried FCIP but I believe there are no issues with it if the IP latency is acceptable.

Balancing of backups is a pain. The backup system might have 10 tape drives available and they will use the ones on one path and nothing on the other path. That causes congestion and slow response.. blah.. blah.. They don't care. Also, use a port channel as you need two ISL's..

NEVER tell anyone that your ISL's are over distance unless you absolutely have too. Companies will always blame that. I have a mixture of Brocade and Cisco switches using the ONS. I had a problem with my Brocade fabric and Brocade immediately stopped working on the problem as they blamed the ONS for creating timeouts, etc.. etc.. It had nothing to do with it but you see what I am saying.

Stephen

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (1 ratings)
Loading.
David McFarland Mon, 09/21/2009 - 15:39

MDS SAN extension over DWDM can be accelerated using FC-WA introduced with SAN-OS 2.1 or IOA (IO Acceleration) recently introduced with NX-OS 4.2.1. Both will provide local acknowledgement and can reduce the latency seen in the wide area network. Both require specialized line cards (SSM for FC-WA, MSM for 4.2.1).

With or without acceleration, it's possible your replication application will work over 60 miles. But, this is something only your replication vendor, or real life testing can confirm. From a MDS SAN fabric point of view, there is no issue with this distance regardless of whether acceleration features are deployed.

60 miles may or may not be the actual distance as far as the optical network is concerned. The actual point to point path and the quantity and quality of any connections in the path will affect the latency.

If you can standup the actual network and provide measured RTT values to the replication vendor, they should be able to provide an answer as to whether it will work or not.

emcprateek_2 Tue, 09/22/2009 - 16:39

Here is an excellent white paper that was identfied in a different post on distance extension solution for SAN

See "Distance Constraints on Fibre Channel Transmission"

https://www.cisco.com/en/US/prod/collateral/optical/ps5724/ps2006/prod_white_paper0900aecd8044aa6d.html

Here is a white paper on Write Acceleration:

http://www.cisco.com/en/US/prod/collateral/ps4159/ps6409/ps5989/ps6217/prod_white_paper0900aecd8024fd2b.html

Borman Bravo Wed, 09/23/2009 - 05:09

Thanks both for your responses, I was really looking for some real life experiences with this type of deployment, by the way, we measured RTT to be 2ms with a 1500 byte packet, thanks again.

Correct Answer
stephen2615 Mon, 10/12/2009 - 00:13

A bit late but hopefully not too late to be useful.

I have two remote Data Centers thanks to a merger. Both are less than 6 miles by car and we have two paths to each of them. Both the second paths are about 34 miles in a wild ring thanks to a very important customer down south.

Our network engineer who controls the ONS doesn't have a clue about FC and has a bad attitude to boot. IP doesn't have any problems with latency but he had no idea about the SAN and latency.

I found he had somehow or another got 1 out of the four paths to go down the long link and the other 3 were using the short link. Our sync HDS True Copy got very upset about this and I could not work out why. We were getting terrible timeouts of about 300 ms. This bought down a couple of mission critical apps that relied on response times of less than about 20 ms. They would not write to disk at the remote size in enough time. We worked the problem out after a bit of yelling at each other. Now he knows to always use both paths if he fails them over and to actually let me know he is doing something. To be fair, the Brocade switches also sharing the links identified the link distance mismatch.

Another issue was turning on In Order Delivery for the mainframe. That should not be mixed with other general traffic. HDS are also (quitely) adamant that you should not mix sync and async from the same ports on the storage. I had to seperate them so we have one pair for sycn and another pair for async. It looks like a SAN problem but its the different writing patterns for sync and asnyc.

60 miles is too long for HDS sync TC. I believe there is an unwritten law that 32 to 34 miles is the longest to use for sync TC. Async is not a problem. I have not used any FC accelerators. I also have EMC arrays and I have no replication issues with them either.

When I first set up the native FC over ONS, I did a fair bit of measuring and testing. Our Exchange geo cluster using sync TC normally ran at about 5 ms response times down both paths. Thats a write to the remote Data Centre and then a write to the storage on site. Heavy work showed about 10 to 12 ms. MS suggest anything less than 20 ms. 32 miles is about 0.5 of a ms extra for latency so I never actually noticed any difference with either path.

I did a lot of async tc to keep replica's of important systems offsite and once all the fine points were worked out, it was a great solution.

60 miles could be 160 using the other path.

I have not tried FCIP but I believe there are no issues with it if the IP latency is acceptable.

Balancing of backups is a pain. The backup system might have 10 tape drives available and they will use the ones on one path and nothing on the other path. That causes congestion and slow response.. blah.. blah.. They don't care. Also, use a port channel as you need two ISL's..

NEVER tell anyone that your ISL's are over distance unless you absolutely have too. Companies will always blame that. I have a mixture of Brocade and Cisco switches using the ONS. I had a problem with my Brocade fabric and Brocade immediately stopped working on the problem as they blamed the ONS for creating timeouts, etc.. etc.. It had nothing to do with it but you see what I am saying.

Stephen

Actions

This Discussion

 

 

Trending Topics: Storage Networking