CSS flow management - VIP NATing bug ?

Unanswered Question
Jan 5th, 2009


We are experiencing a very strange behaviour on several CSS's in our webfarm environments.

We're running on a CSS 11506 with WebNS

The flow is very basic and simple:

1. A client connects to a VIP on our CSS.

2. Based on a content rule decision, the CSS forwards the packet to the real server IP

3. The server reply gets back through the load-balancer and the source-IP of the packet gets replaced by the VIP (the original IP the client had connected to)

We have some network traces and firewall logs showing that this last step (step 3 above) is sometimes missed for some packets (it works 99% of the time), on existing flows. Sometimes the CSS does NOT set the VIP as the source IP in the reply packets, but rather keeps the real server IP which obviously breaks the TCP paradigm/connection.

The flow is not being torn down, the wrong packet is just discarded by our firewall and, thanks to TCP, the reply is anyhow being retransmited later, and this time, properly handled by the CSS.

The configuration is OK (well, it's just failing on few packets for an existing and established flow) and it really looks like a bug in flow handling.

The CSS is not overloaded: I checked the CPU, the free memory as well as the free FCB left and we're far from the limit.

Any known issue ? Any idea of what to look at now to investigate further ?



I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (2 ratings)
Gilles Dufour Mon, 01/05/2009 - 07:41


I'm not aware of such an issue.

I would recommend to test the latest 8.10 image since we made quite a lot of modification over the year that has gone by since releasing 8.10.107s.

If the problem persists, I would suggest to open a service request with the TAC so that we can fix the problem.


arnaud.chiaberge Mon, 01/05/2009 - 07:50

Thanks Gilles for your answer.

I've opened a TAC case for that issue. In the meantime we've made some progress on the investigation by capturing several additional network traces and we found that it's always the same kind of packets that are affected by this bug:

It seems to always happen upon the TCP connection closure initiated by the server, in such cases the last ACK of the TCP session closure, sent by the server to the client (in response to the client FIN-ACK), is not being properly handled by the CSS and doesn't get the proper NATing on its way back to the client.

Maybe developpers have a clue on this or know that it's been fixed on a more recent OS.



pmccubbin Mon, 01/05/2009 - 09:31

Hi Arnaud,

Thanks for providing the follow-up news regarding the TAC case you opened to solve this issue.

Please keep the forum up to date on your progress.

A +5 from NYC for keeping us in the loop.



arnaud.chiaberge Mon, 01/12/2009 - 23:53

An update and closure to this case.

In depth troubleshooting, with network traces both north and south flows of the CSS have been provided to Cisco TAC.

It turns out to be a normal behaviour of the CSS (quoting Cisco TAC).

The actual scenario is the following:

As soon as a FIN packet is seen from either side of a connection, the flow remains valid in the CSS only for certain amount of time, in order for the connection to finish properly, before it's been torn down for ressource reclamation.

For some reasons (unknown yet) the server never receives the final ACK to its FIN-ACK from the client (maybe problems client side).

As a result, the server keeps re-transmiting the FIN-ACK packets several times in a row, with a retransmission timeout increasing according to the exponential back-off algorithm (http://www.rhyshaden.com/tcp.htm) which is a normal behaviour of the TCP stack.

Because the flow has timed-out on the CSS, these last FIN-ACK packets from the server to the client are simply routed on the CSS, with no more proper NATing.

Not sure my explanation is clear. The behaviour of the CSS seems normal according to Cisco (flows have to be garbage collected anyhow at some point) and it explains why some packets don't get properly NATed after that garbage collection has occured.

The CSS behaviour could be enhanced by waiting for BOTH side of the connection to have confirmed the FULL closure of the connection before launching its flow ressource reclamation timer.


This Discussion