cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
982
Views
0
Helpful
18
Replies

Sticky problem

michael.e.reid
Level 1
Level 1

Hi, we have an issue where the sticky tables on our CSS are too large, so that if a server fails, connections move to the rest of the farm. As the sticky table does not time out anytime soon the failed server does not get many connections when it is back online.

Sticky-inact-timer command does not work as this only makes the entries eligible for removal.

The content rule is a L4 on port 443, I tried to configure this as an L5 rule with arrowpoint cookies but I suspect this does not work as we are using SSL connection that is not terminated on the CSS.

The servers themselves send a cookie ? Can I make use of this or as the connection is port 443 am I stuck ?

Any other solutions would be much appreciated.

cheers,

Mike

18 Replies 18

Gilles Dufour
Cisco Employee
Cisco Employee

Mike,

this has nothing to do with a sticky table full.

With just 2 clients and 2 servers, if one server goes down, you will end-up with all sticky entries pointing to a single server all your traffic will go there.

This is normal.

When a server goes down, you should either have a short sticky timeout or manually clear the sticky table each time a server goes up and down.

There is no other solution.

Gilles.

Gilles,

The problem is not that the sticky table is full, it is that is takes so long to rollover. 130,000 entries is quite a lot, I would imagein in small companies it may take weeks for the sticky table to rollover.

I understand the behaviour is normal but it is not what we want. It causes uneven load on the servers, which a loadbalacner is supposed to help avoid, becaue the sticky table takes days to clear.

Our sticky timeout is short, but when it expires the entry is not removed, it only become eligible for removal. If we have no new entries it will still not get removed from the table.

Purging the table is not an option as our apps are used 24 hours.

cheers,

Mike

Gilles,

Can we use advanced-balance ssl even though we are not making use of the SSL offloader ?

If so, although it still uses the sticky table the session ID should change every 30 mins or so so this may resolve the problem ??

cheers,

Mike

Mike,

let me first say, that if you configure a sticky-inact-timeout, once the entry times out it is removed IMMEDIATELY. There is no concept of elligibility. [this for flows and garbage collection - nothing to do with stickyness].

Here is an example

CSS11503-2(debug)# show sticky-table l3-sticky

L3 Sticky List on Slot 1, subslot 1:

Entries for page 1.

Entry Hash Rule Rule Srv Srv Time(Sec) Hit Col Elem Inact

Number Value Indx State Indx State Elapsed Cnt Cnt Type Cfg(Min)

------------------------------------------------------------------------------

1 c0a81429 5 ACT 9 EGRES 59 2 0 L3 1

Total number of entries found is 1.

L3 Sticky List on Slot 3, subslot 1:

Entries for page 1.

Entry Hash Rule Rule Srv Srv Time(Sec) Hit Col Elem Inact

Number Value Indx State Indx State Elapsed Cnt Cnt Type Cfg(Min)

------------------------------------------------------------------------------

Total number of entries found is 0.

CSS11503-2(debug)# show sticky-table l3-sticky

L3 Sticky List on Slot 1, subslot 1:

Entries for page 1.

Entry Hash Rule Rule Srv Srv Time(Sec) Hit Col Elem Inact

Number Value Indx State Indx State Elapsed Cnt Cnt Type Cfg(Min)

------------------------------------------------------------------------------

Total number of entries found is 0.

L3 Sticky List on Slot 3, subslot 1:

Entries for page 1.

Entry Hash Rule Rule Srv Srv Time(Sec) Hit Col Elem Inact

Number Value Indx State Indx State Elapsed Cnt Cnt Type Cfg(Min)

------------------------------------------------------------------------------

Total number of entries found is 0.

CSS11503-2(debug)# show sticky-table l3-sticky

L3 Sticky List on Slot 1, subslot 1:

Entries for page 1.

Entry Hash Rule Rule Srv Srv Time(Sec) Hit Col Elem Inact

Number Value Indx State Indx State Elapsed Cnt Cnt Type Cfg(Min)

------------------------------------------------------------------------------

1 c0a81429 5 ACT 14 EGRES 3 1 0 L3 1

As you can see, with a 1 min timeout, after 60 sec the entry is removed, and the next time the client comes in it is sent to a different server which creates a new entry.

So, your problem is that you either do not have the sticky-inact-timeout, in which case you need to manually clear the sticky table when a server goes down/up, or you have the timeout configured but with a value too high so the sticky entry is never removed because always refreshed by a new connection.

You can use 'advanced-balance ssl' without the ssl module but it only works with 1 type of ssl protocol - SSLv2 [I think] and for the other protocols it just reverts back to sticky-srcip.

So, you should stick with sticky-srcip and just make it works correctly by setting correct parameter or by clearing the sticky table manually.

Finally, I'd like to say that there is a known-issue with sticky-srcip in general.

This is the use of mega-proxy on the Internet.

A lot of people sitting behind a proxy and therefore appearing with a single ip address on the internet.

This is known to cause un-even loadbalancing.

That might be your problem and changing the inact-timeout would have no effect.

This is one of the reason for a lot of people to buy the ssl module so they can use cookies.

Gilles.

Gilles,

I have reported this issue to Cisco TAC in Belgium and was informed that the entry is only removed if the timer has timed out AND there is a new entry that needs that entry in the table. Otherwise the entry will stay in the table.

I will double check tomorrow and post some more but I am convinced I connected to one of our content rules with a 31 min timeout, my entry was still there the following day despite having not gone to that content rule again.

Thanks for the info though. This has been a long standing issue and I really need to get it solved.

I tested the advanced-balance SSL today and it worked well on the test environment as it is based on SSL session ID which is renogiated every 30 mins and also everytime the browser is closed I believe.

cheers,

Mike

Mike,

do you have your case number for this issue.

I can check who answered your questions and make sure he gets the right info.

Thanks,

Gilles.

Gilles,

I have reference SR 605081777.

Also, here is my sitcky entry for our app that has a 31 min timeout set. I will check again to see if it has disappeared in 31 mins. At the moment it is 565 seconds (9 mins)

NLAMSDC2CS001# sho sticky-table l3-sticky ipaddress x.x.x.x 255.255.255.255

L3 Sticky List on Slot 1, subslot 1:

Entries for page 1.

Entry Hash Rule Rule Srv Srv Time(Sec) Hit Col Elem Inact

Number Value Indx State Indx State Elapsed Cnt Cnt Type Cfg(Min)

------------------------------------------------------------------------------

2 911a7e32 13 ACT 12 ALIVE 565 527 0 L3 0

Total number of entries found is 2.

cheers,

Mike

Gilles,

Here is the same sticky entry 42 minutes entry, this is relevant to rule 13. You can also see the hit count has not increased as I have not been back to that service.

NLAMSDC2CS001# sho sticky-table l3-sticky ipaddress 145.26.126.50 255.255.255.255

L3 Sticky List on Slot 1, subslot 1:

Entries for page 1.

Entry Hash Rule Rule Srv Srv Time(Sec) Hit Col Elem Inact

Number Value Indx State Indx State Elapsed Cnt Cnt Type Cfg(Min)

------------------------------------------------------------------------------

2 911a7e32 13 ACT 12 ALIVE 2565 527 0 L3 0

Total number of entries found is 2.

You can see rule 13 is related to:

NLAMSDC2CS001# sho rule-summary | grep 13

x.x.x.x 443 TCP gep-DR.shell-o SAPportal Act 13

And from the content rule you can see the timer is set to 31 minutes. Therefore the sticky-inact-timeout setting is not causing the entry to be removed after 31 mins.

content gep-DR.shell-online.com

vip address x.x.x.x

protocol tcp

port 443

add service A

add service B

add service C

advanced-balance sticky-srcip

add service D

add service E

sticky-inact-timeout 31

active

Would be good to get some clarification if this is standard behaviour, or maybe we have a bug. We are using 07.50.1.30 s/w.

cheers,

Mike

Gilles,

My sticky entry is still there 105 mins later with no additional hits.

NLAMSDC2CS001# show sticky-table l3-sticky ipaddress x.x.x.x 255.255.255.255

L3 Sticky List on Slot 1, subslot 1:

Entries for page 1.

Entry Hash Rule Rule Srv Srv Time(Sec) Hit Col Elem Inact

Number Value Indx State Indx State Elapsed Cnt Cnt Type Cfg(Min)

------------------------------------------------------------------------------

2 911a7e32 13 ACT 12 ALIVE 6306 527 0 L3 0

Total number of entries found is 2.

cheers,

Mike

Mike,

except that you're inact timeout when the entry was created was null - you can see it in the show command. This is the last column.

A zero means, no timeout.

In this case, you have to wait for the table to get full to see it disappear.

If you change the inact-timeout after the entry was already create, it has no impact on those existing entries.

You need to clear the sticky table.

Or you have it a different content rule which does not have an inact-timeout configured.

Gilles.

Gilles,

In this case, you have to wait for the table to get full to see it disappear - this is our problem, the table takes too long to get full !!!!

When I go back into this service, after many hours, I still get sent to the same server (Serv index 12)

NLAMSDC2CS001# sho sticky-table l3-sticky ipaddress 145.26.126.50 255.255.255.255

L3 Sticky List on Slot 1, subslot 1:

Entries for page 1.

Entry Hash Rule Rule Srv Srv Time(Sec) Hit Col Elem Inact

Number Value Indx State Indx State Elapsed Cnt Cnt Type Cfg(Min)

------------------------------------------------------------------------------

2 911a7e32 13 ACT 12 ALIVE 7 530 0 L3 0

Total number of entries found is 2.

This service has has the timeout set for a few months now.

cheers,

Mike

Still, the entry shows a configured timeout of '0'.

Could you get a 'sho rule-summary' and verify the rule index [last column] and compare it to the rule-index of your sticky entry.

Make sure this is a match.

If yes, you need to clear the sticky table.

Could you try to do "sticky-purge l3-sticky " to remove your own address.

Gilles.

Hi Gilles,

I did that in my ealier post today, they are bith rule index 13.

VIP Address Port Prot Url CntRuleName OwnerName Stat Idx

--------------- ----- ---- ------------------ -------------- --------- ---- ---

x.x.x.x 443 TCP gep-DR.shell-o SAPportal Act 13

NLAMSDC2CS001# show sticky-table l3-sticky ipaddress x.x.x.x 255.255.255.255

L3 Sticky List on Slot 1, subslot 1:

Entries for page 1.

Entry Hash Rule Rule Srv Srv Time(Sec) Hit Col Elem Inact

Number Value Indx State Indx State Elapsed Cnt Cnt Type Cfg(Min)

------------------------------------------------------------------------------

2 911a7e32 13 ACT 12 ALIVE 1439 530 0 L3 0

Total number of entries found is 2.

Also,if I look at the sticky info for this rule you can see the configured timeout.

NLAMSDC2CS001# sho rule SAPportal gep-DR.shell-online.com sticky

Balance: Round Robin

Advanced Balance: Source IP

Sticky Mask: 255.255.255.255

Sticky Inactivity timeout: 31 minutes

Sticky No Cookie Found Action: Balance

Sticky Server Down Failover: Balance

ArrowPoint Cookie Path: /

ArrowPoint Cookie Expiration: Browser Exit

ArrowPoint Cookie CSS Expired

ArrowPoint Cookie Service: Keep Current

ArrowPoint Cookie Name: ARPT

String Match Criteria:

String Range: 1 - 100

String Prefix: ""

String Eos-Char: "" String Ascii-Conversion: Enabled

String Skip-Len: 0 String Process-Len: 0

String Operation: Match-Service-Cookie

Location-Cookie: Not Configured

Location-Cookie Expiration: Browser Exit

Cookie-Domain: Not Configured

cheers,

Mike

Mike,

sorry to ask you this - I know this has been going on for a long time already but I want to be 100% of what we have here.

Could you clear the entre with the command I gave you, then check with a 'show sticky-table' that the entry is not there anymore.

Then open a new connection and check the entry again. Verify if the timeout is 0 or 31.

Thanks,

Gilles.