cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
942
Views
4
Helpful
7
Replies

WAE (WAAS) Failing on SnapMirror Replication

mlenco
Level 1
Level 1

WAE installed between two sites for NAS filer replication has failed during numerious attempts to kick off NetApp SnapMirror Replication. When we turn off TCP optimization, compression and application services the replication continues. Cisco CCIE looked at it and through his hands up. Is anyone troubleshooting WAE [WAAAS] discovering a possible problem/resolution similar to what is described?

Thanks in advance,

Matt

7 Replies 7

Zach Seils
Level 7
Level 7

Matt,

I am looking at this exact issue for another customer. Can you send me a simultaneous packet capture from both WAEs?

Thanks,

Zach

Zach

The boss is wrapped a little tight and warned me not to put it on the forum. To make things complicated this gear sits between the main building and bureau which both are distant locations from me.

The config is similar to that pulled right out of the WAE quick configuration guide which we entered via console CLI. We have enabled the tcp optimization groupings et cetera with the Java based Cisco generic GUI.

Would it be out of the realm of possibility that the upstream routers with old sup 720s could have their input and congestion buffers overun by throughput hitting line rate for more than 5 seconds(Hello for large pipe) or 30 seconds(Hello for small pipe), dropping EIGRP control data, removing its route from the topology then dropping the path completely?

I am asking my guys to look for No EIGRP Adjacency errors at around the time they test putting the WAAS back online.

Feedback, good or bad?

Do you see the same issues with other types of traffic? If not, I wouldn't expect instability in the routing infrastructure to be the cause.

Let me know ...

Zach

Zach

This is a dedicated pipe with only the IP Netapp data. We don't see this issue with other types of traffic but other traffic isn't optimized either.

In a former life,

I was experimenting with FCIP on a production IP path between buildings. Though I had the tunnel set for 500Mbs/ at an apportioned 550 of an OC-12 and it was sustaining 500Mb/s in one direction, a burst of traffic in the other direction close to 550Mb/s caused the 6509 with circa 2003 sup 720s got overwhelmed, dropped eigrp control data and the route/connection causing flapping at Layer 8 of the OSI model (political). With that said why are upstream routers buffers (input/output queue buffers as well as congestion management) never questioned?

b-beavers
Level 1
Level 1

Just the opposite. We installed wae-612s on both ends of a dedicated replication network and we used NetApp SnapMirror. We saw a 50% reduction in bandwidth usage during our nightly replication.

I am glad to see someone else feeling my pain. Quick questions:

1. How big is the pipe?

2. When you say you have 50% reduction does that mean you could push ~400Mb/s over an OC-12 then when turning on optimization you see port stats showing throughput at 200Mb/s?

3. Do you suspect the throughput reduction is caused by limitations of upstream routers?

4. What do you see when you check sh proc status?

5. What do you see when you check the ingress ports of the upstream routers? [there is a cool free perl app named MRTG you can configure to do an snmp walk of all your routers and show daily, weekly and monthly graphs of Mbits/s]

I look forward to talking with you further.

Matt

You can contact me directly at 7635056047 to discuss details.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: