Arrowpoint cookies and state changes

Unanswered Question
Jan 6th, 2008

We have an 11050 6.10 build 4 (replacing it soon with a 11501) that is setting a cookie so we can stick a client to a server. The application is also setting a JSESSION cookie. The service is doing a HEAD to a specific page to verify the service is up. The service can change state often (say 1000 times in 2 hours) but the service is not always marked as down. It may only be marked as down 5 to 10 times in those 2 hours. The users are experiencing slow response and are getting kicked out of the application and going back to a login screen. My questions are:

1. State Change Counters. If I go from alive to dying to alive is that 1 or 2 state changes?

2. If a service is dying and a client connects to the service with the cookie already set will the CSS send them to the dying server or will it send them to the alive server? If it sends them to the alive server does it reset the cookie?

3. If the service is down does the CSS send a RST to the client or does it just over write the cookie and send it to the alive server?

4. Service timeouts. Is it true that the timeout for a service is the frequency -1? So if I have a frequency of 5 seconds if the CSS doesn't get a response within 4 seconds the service would go to the dying state?


I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Diego Vargas Mon, 01/07/2008 - 06:06


Here are the answers:

1. The CSS will log a status change when going to Down or Alive state, Dying is not counted as a status change on the counters.

2. The Dying state will show up when the CSS fails the first keepalive. By default the CSS has a maxfailure of 3 so it will attempt 3 keepalives before changing the service status to Down.

While the service is Dying it will continue getting traffic as if is still Alive (actually it is) so the request coming to the cookie for the dying service should go to that same service.

3. If the service is already alive the CSS should balance the flow among the alive servers and set a new cookie with the new server string.

4. Well Kind of. As I mentioned before the service will go dying state when the first keepalive is failed. The frequency is the amount of time between each keepalive, in other words the interval.

If the frequency is 5 and the server is not able to answer the keepalive in 5 seconds, then the CSS will considered a failed keepalive and go into dying state.

With regards to the fact that you see many state changes on the server but that much on the CSS, well by default the CSS has to fail 3 keepalives before considering the service down and the frequency is 5 sec by default, so it will actually take about 15 to 20 sec for the CSS to consider the service down and log a state change.

If the CSS failed one keepalive but the second worked then there was a period of time where the service was Dying but when the second keepalive worked the service went back to Alive and started again the cycle, at that point another 3 failed keepalives would be need it to go Down.

Hope it helps!!

Gilles Dufour Mon, 01/07/2008 - 06:40

a dying service is still alive.

So there is no state transition when going from alive to dying or dying to alive.

Connections are still being forward to servers in dying state.

When a service goes down, active connections stay with the down server. No remapping and no new cookie.

New connections will be forwarded to one of the remaining services alive.

Finally, the response timeout is indeed the frequency minus 1.

This is a strange service that you have.

The CSS was designed assuming all services will be up most of the time and sometimes go down. Your setup is against this with services continously flapping.

Can't you find a way to maintain the service up ?? Why are they going down so often ?

You can maybe use the maxconn command to limit the number of connections going to a service and prevent the flapping of a server that would be overloaded.


t.doherty Mon, 01/07/2008 - 07:22

Thanks for the response. According to the Cisco documentation below when a service is down the client will be directed to the alive server. If clients aren't automatically sent to the alive server how would they ever get off the down service?

The service isn't strange it's the app that's strange ;-) Basically they're getting slow response and the clients are getting kicked out of the app. As usual they want to blame every thing else but the app.

The increase that I thought I was seeing in the state counters might not be accurate. When I did the show service it said the counters had been cleared this morning and they were already up to 1300. However, no one logged into the CSS except our Ciscoworks server. I'm not sure why it said they were cleared this morning unless CW2K is doing it. I cleared the counters and they're back to zero so I'll monitor it.

---Cisco Doc-------

When a client comes in with a valid cookie request but the sticky server is not available, the CSS uses the sticky-serverdown-failover configuration to handle the request.

By default, the sticky-serverdown-failover is configured as balance. The sticky-serverdown-failover balance method will treat the client's request as an initial request without the ArrowPoint cookie. It uses the load-balancing algorithm to choose a server, and then redirects the request with a generated ArrowPoint cookie.

The other option is a failover type of redirect. In this case, the CSS redirects the request to the specified URL.

The command sticky-no-cookie-found-action should not be configured in an ArrowPoint cookie content rule. Not only will this command not work, it produces many irregularities in the CSS.

Diego Vargas Mon, 01/07/2008 - 07:44


About this:

According to the Cisco documentation below when a service is down the client will be directed to the alive server. If clients aren't automatically sent to the alive server how would they ever get off the down service?

When a service goes down new connections will be directed to the alive service but connections already established cannot be moved as the flow was already mapped to the initial server.

Are you able to test bypassing the CSS? That could be a good test in order to confirm if the app has issues even with the CSS not in the middle.

Also sniffer traces at both sides of the CSS will show if the CSS is creating delays and causing performance issues.

Also you might want to test layer 3 stickiness and see if the clients keep getting kick off.


This Discussion