ACE module possibly discarding TCP-Retransmitions ???

marioderosa2008 · ‎09-30-2011

Hi all,

our ACE module in our 6509 seems to not forward all tcp-retransmitions to the servers in the server farm.

We have an issue with a particular application over our WAN and I can see that after the servers send duplicate acks for missing segments, the client does respond with retransmissions which get received by the outside interface of the ACE module, but I cannot see those retransmissions being passed to the server farm which I think is causing the TCP session to drop.

Has anyone had any experience with this before?

I found a document at http://www.cisco.com/en/US/docs/interfaces_modules/services_modules/ace/v3.00_A1/configuration/security/guide/tcpipnrm.html#wp1071390 which advises that you can use the command "set tcp queue-limit" to increase the queue size of out of order tcp segments from the default of 5. I cannot seem to tyoe this command in to our ACE module.

not sure if that command is valid any more.

ACE module details are:

Software

loader: Version 12.2[121]

system: Version A2(3.2) [build 3.0(0)A2(3.2)]

system image file: [LCP] disk0:c6ace-t1k9-mz.A2_3_2.bin

installed license: no feature license is installed

Hope i've explained the issue clearly enough. Please let me know if you need anything elaborating.

thanks

mario

chrhiggi · ‎10-03-2011

Mario-

It would be very useful to see a packet capture and configuration from the ACE itself to understand the flow of traffic against the configuration. How it will respond to this type of issue varies quite a bit depending on configuraiton.

Off the cuff, I would say try to disable normalization temporarily on both vlan interfaces involved and see if that changes the behavior. (note - this is just for testing - even if it works, you don't want to leave normalization off unless you have already analyzed your environment for leaving it disabled nominally.)

Regards,

Chris Higgins

marioderosa2008 · ‎10-27-2011

Hi Christopher,

sorry for the late reply, but i've been away.

We have an outside VLAN interface and an Inside VLAN interface. Traffic is routed rather than bridged.

How can I upload a packet capture Chris? Not done that on here before.

Is there anywhere that I can find out what the effects would be when I issue the "No Normalisation" command? At the moment, the only thing that I have picked up is that a SYN packet is enough to put the TCP session in to an ESTABLISHED state. Are there any other side effects to this command?

Thanks for your input.

mario

ajayku2 · ‎10-28-2011

The other side effect would be that the ACE will create a connection enrty even when someone sends a packet with SYN-ACK.

So in that case if there is a syn flood attack in your network. ACE can easily fill the connection table and consume lof of buffer space.

As Chris has mentioned you should disable it just for testing purpose this is not recommended solution.

marioderosa2008 · ‎11-03-2011

Thanks very much for your help so far guys.

I have disabled normalization and the transfers work fine!

I'm analysing the wireshark captures to try and figure out why.

I set up to SPAN sessions on my 6500. One capturing traffic received on the outside routed VLAN 93 and one capturing traffic transmitted on the inside VLAN 104 interface.

Are there any common design problems with 2 load balancers in FT group which causes issues with TCP Normalization?

Thanks

Mario

ajayku2 · ‎11-03-2011

There is no known issue.

You may check MTU, try setting the parameter which is suggested by Chris if that helps.

marioderosa2008 · ‎11-03-2011

Hi Chris,

I have attached packet captures for inspection.

VLAN93 (outside) does look normal to me, however the traffic generated in VLAN 104 (Inside) does not look normal.

If you have time, can you inspect them and let me know if you spot in findings?

thanks

mario

marioderosa2008 · ‎11-03-2011

Packet captures attached.

marioderosa2008 · ‎11-03-2011

In the attachment of VLAN 93, you can see that the client IP of 10.225.108.x initiates the HTTP request to the Virtual IP of 10.129.1.39.

Then on VLAN 104 you can see that the Load Balancer transmits 2 copies of the same SYN both with different destination IP addresses.

The SPAN config that I configured on our 6500 is

monitor session 10 source vlan 104 tx

monitor session 10 destination interface Gi1/2

monitor session 20 source vlan 93 rx

monitor session 20 destination interface Gi1/39

Hope this all helps..

Mario

chrhiggi · ‎11-03-2011

Mario-

In the traces, all packets are seen twice due to the way the monitor session is capturing the packets, there are not actually 2 being sent.

ACE modules connect to the switch fabric using a virtual port with a trunk (TenGig slot number/1). ACE appliances would use a physical port, possibly trunked depending on the configuration. The monitor session is setup to capture packets being transmitted on vlan 104 - the ACE transmits the packet on vlan 104, it goes into the switch fabric, it is then transmitted out of a physical interface on the switch.

For RX, the switch recieves the traffic on a physical port, then transmits it into the ACE where it becomes a recieved packet on the phsical device. So you only see it once when it is recieved into the vlan on the physical port where the client is connected (or gateway is routing it in).

chrhiggi · ‎11-03-2011

Mario-

I forgot to ask... You noted that with normalization off, the transfer works. I am looking at the traces and I can see that there are dropped packets every now and then from the client. The server starts by sending an ACK for the previous frame indicating there was a lost segment and it wants the client to retansmit. However, the client kicks out quite a bit more data - 40kb or so before it sends the retransmission. I find it interesting that the connection allows selective acknowledgement, but doesn't use it. Normalization would clear that by default, which, there shouldn't be any effectual difference here.

Can you capture a failure for me? I suspect something would be different, maybe the mss is being violated, etc.

Regards,

Chris

marioderosa2008 · ‎11-04-2011

Hi Chris,

the attached are captures of the instant failures that occured as soon as we turned Normalization back on again. The client machine is physicaly located on a private WAN and I do have some concerns about the amount of TCP segments that seem to get lost in our WAN. However I am unsure whether the Load Balancers are at fault or whether it is because of the lost TCP segments on our WAN that are causing the Load Balancers or TCP Normalization to drop the connection. I also have packet captures that show the receiver asking for the same lost TCP segment over 40 times before eventually recieving the segment.

We are going to run with TCP normalization off for 48hrs next week to see what impact it has on business as usual traffic conditions. Again this is only for testing purposes. I would say that if the outcome of the tests are successful, then that could suggest that the excessive TCP retransmissions on the WAN are just a red herring and that TCP is in fact operating as it should do.

If you can let me know once you have had a chance to look at the captures that would be great.

Thanks for you assistance Chris.

Mario

chrhiggi · ‎11-04-2011

Mario-

Very interesting trace. I would like to keep going with this, can you get a TAC case opened off of this thread. Attach a showtech from this context and both sets of traces.

Something is definitely bothering normalization in terms of the sequence in which the packets arrive - but I don't see a reason for it in the traces. I suspect there is a trigger, possibly due to the window and how the client retransmits, but I will need to check the CP and how normalization is percieving the connection in that state.

Regards,

Chris

marioderosa2008 · ‎11-17-2011

Hi Chris,

sorry for the delay in my reply, I have been away. Thanks for persueing with this case. For your information, I have already opened a TAC support query through our Cisco Partner, but they have simply come back advising that the applications that we are running do not like the TCP sequence numbers that the ACE generates and that the only resolution will be to keep TCP normalization disabled.

I am reluctant to keep it disabled, but i have for now as there is definately a vast improvement now with normalization turned off.

One thing I have noticed since normalization has been turned off. We monitor the security context using Solar Winds Orion which polls the ACE to display the number of concurrent connections to each server in the server farm. I have noticed that since Normalization has been disabled, the number of concurrent connections has at least trippled. I cannot tell whether these are legitimate application connections or just malicious SYN packets.

Any way, I have attached a show tech from the context in question.

I am really keen in getting to the true answer as to why normalization has this bad effect on our application.

thanks for your help so far and if there is any other information that you need then please let me know.

thanks

Mario De Rosa