Datacenter troubleshooting guide - day 5

Blog

Dec 8, 2010 5:04 AM
Dec 8th, 2010

“Datacenter troubleshooting guide” – a blog by  Gilles Dufour.

Day 4 - Stickyness (2nd part)

Troubleshooting stickyness is not an easy task.

There is no debugging option to know why ACE decided to send a connection to a specific server.

There are a few common errors that we can spot by looking at stats.

Here is the most well known.

1. Whatever the sticky type you selected, when reason to not stick to a server is if the sticky entry timed out or was removed due to lack of sticky resource. So, check your sticky resource allocation with the following command :
AceC6k2 Admin# show np 1 me-stats "-slb -v"

LB Perf stats at address 0x82e05000

LB Perf stats at address 0x82e05000
LB Statistics
--------------
(Context ALL Statistics)
...
Num Stolen For Reuse:                             0             0
Num Active Sticky Entry:                          0             0
Num Active Reverse Sticky Entry:                  0             0
Active Conn Count:                                0             0
Free Sticky Entry Count:                    1794111             0
Num Grp or Timeout Nodes:                         4             0
Static Entry List Count:                          0             0
Num Entry Configured:                       1794111             0
Prev Resources Req:                         1794111             0
Drop Max Remote Stky:                             0             0

The "Num Stolen For Reuse" indicates that you have not enough sticky resource and old sticky entries got deleted in order to save new ones. This is a possible reason for sticky failures. To solve this problem, you can either increment the sticky resource or for sticky source ip, you can change the mask so that one entry can be used for all clients belonging to a same subnet.

"Drop Max Remote Stky" counts the number of sticky lookup that failed because the communication between internal components is too slow to process all the sticky lookup.

2. If you have configured any form of HTTP stickyness (cookie, header, ...), one reason to see sticky failures is http parsing errors. ACE will need to parse the HTTP header to find the information needed to perform stickyness (ie: cookie).  If the HTTP request is malformed, ACE will fail to properly parse the data and will not even try to do a sticky lookup.  Check if there was no any parsing errors with the command :
AceC6k2/Admin# show np 1 me-stats "-shttp -v" | i pars
Second pass parsing:                              0             0
Static parse errors:                              0             0
Max parselen errors:                              0             0
AceC6k2/Admin#

If you see "Static parse errors" that could explain your sticky failure.  Unfortunately, there is no solution here.  With ddts
CSCtj42969 we added a new command to continue parsing http data even when an error was detected.  See previous Blog post for more information.
"Max parselen errors" is the result of http headers being too big and the data we need located beyond our buffer.  You can adjust the size of the parsing buffer with a parameter-map
AceC6k2/Admin(config)# parameter-map type http ParseLen
AceC6k2/Admin(config-parammap-http)# set ?
  content-maxparse-length      Configure content maxparse length
  header-maxparse-length       Configure header maxparse length
  secondary-cookie-delimiters  Configure URL/COOKIE (cookie-in-url-query) delimiter

Clearly, the stats above can't explain all possible sticky issues.  We often have to capture sniffer traces and dump the sticky table regularly in oder to try to understand what is going on.
This is why I have introduced  a new command in version 4.1.x to list connections linked to a particular sticky entry.

CSCti03084: Enh: show conn sticky - display connections linked to a sticky

Here is how to use this new command.

First, list the sticky entries in "detail" mode  (the command is hidden) and exract the internal entry id.

Scimitar1/Admin# show sticky data detail
processor (0/0):             0
results index:               1 of 1
sticky group:                RESERVATIONS_STICKY
sticky type:                 IP
rserver:                     vmware-46
realPort:                    80
timeout (secs):              1200
sticky-entry:                192.168.20.0
internal entry-id:           0x200004
time-to-expire (secs):       1198
stcky-hit-count:             63402
active-conn-count:           1
in-use reference count:      0
static entry:                FALSE
reverse entry:               FALSE
active entry:                TRUE
timeout-active-conns:        FALSE
created-from-HA-peer:        FALSE
HA-replicated-at-least-once: TRUE

With the internal entry-id you can list all connections using that entry.

Scimitar1/Admin# show conn sticky 0x200004

conn-id    np dir proto vlan source                destination           state
----------+--+---+-----+----+---------------------+---------------------+------+
653622     1  in  TCP   20   192.168.20.45:47377   192.168.20.15:80      SYNSEEN
658234     1  out TCP   40   192.168.40.46:80      192.168.20.45:47377   INIT
Scimitar1/Admin#

You can also retrieve the internal entry-id used by a connection.

For that, you need to know the conn-id and the NP to which the connection is associated.

switch/Admin# show np 1 me-stats "-c 242"
Connection ID:seq: 242[0xf2].0
  Other ConnID    : 243[0xf3].0
  Proxy ConnID    : 0[0x0].0
  Next Q    : 0[0x0]

192.168.20.45:44425 -> 192.168.20.15:80 [RX-NextHop: TX] [TX-NextHop: TX]
  Flags:  PAT: No  DynNAT: No  Implicit PAT: No On_Reuse: No
  L3 Protocol     : IPv4                L4 Protocol    : 6
  Inbound Flag    : 1
  Interface Match : Yes
    Interface MatchID: 3
  EncapsID:ver    : 45:0                TCP ACK delta  : 0xc8228d7e
  MSS             : 1460                TOS Stamp       : 0
  Repeat mode     : No          Punt Flag      : No
  TOS Stamp       : No          TCP Window Check: No
  ACE ID          : 98          NAT Policy ID       : 0
  Post NAT hop    : 0
  Packet Count    : 5           Byte Count          : 309
  TCP Information: (State = 3)
    Window size   : 6432                Window scale    : 0
    FIN seen      : No          FIN/ACK seen    : No
    FIN/ACK exp   : No          Close initiator : No
    FIN/ACK expval: 5b40000             Last seq        : 9f50cfc0
   timestamp_delta: 0           Last ack        : 947b3980
    Data/ACK Trigger : 947b3980         Trigger Status   : 1
    Timestamp : 39cd8
  TCP options negotiated:
    Sack:Clear          TS:Clear        Windowscale:  Clear
    Reserved: Allow     Exceed MSS:  Deny       Window var: Allow

 
Sticky Internal Entry-id : 0x200006 

Raw Connection Entry
0000  0x00000000  0xc0a8142d  0xc0a8140f  0x06090003
0010  0xad890050  0xc8228d7e  0x002d0000  0x05b40000
0020  0x000000f3  0x00000000  0x080a2480  0x24450000
0030  0x00000005  0x00000135  0x19200030  0x05b40000
0040  0x9f50cfc0  0x947b3980  0x00039cd8  0x00000000
0050  0x00000062  0x947b3980  0x00039cb1  0x00000000
0060  0x00000000  0x00000000  0x00000000  0x00000000
0070  0x0000bc90  0x00000000  0x00000000  0x00000000
switch/Admin#


[Wrap text]  [Edit this enclosure]
Unit-test: Added 07/26/2010 06:24:00  by gdufour

[Unwrap text]  [Edit this enclosure]
Unit-test: Added 07/26/2010 06:24:00  by gdufour

[Unwrap  text]  [Edit this enclosure]
Release-note: Added 07/26/2010  06:19:54 by gdufour



[Wrap text]  [Edit this enclosure]
Release-note: Added 07/26/2010  06:19:54 by gdufour

[Unwrap text]  [Edit this enclosure]
Release-note: Added 07/26/2010  06:19:54 by gdufour

[Wrap text]  [Edit this enclosure]
Diffs-commit-converge_dev: Added 07/29/2010 04:05:46 by gdufour

Diffs-commit-converge_dev: Added 07/29/2010 04:05:46 by gdufour
Can not view this . file attachment inline, please click on the following link to view the attachment.
http://cdetsweb-prd.cisco.com/apps/dumpcr_att?identifier=CSCti03084&title=Diffs-commit-converge_dev&ext=&type=FILE

Diffs-commit-converge_dev: Added 07/29/2010 04:05:46 by gdufour
Can not view this . file attachment inline, please click on the following link to view the attachment.
http://

[UnWrap text]  [Edit this enclosure]
Diffs-commit-converge_dev: Added 07/29/2010 04:05:46 by gdufour

[Wrap Text]  [Edit this enclosure]
Diffs-commit-converge_dev: Added 07/29/2010 04:05:46 by gdufour

[Wrap text]  [Edit this enclosure]
Diffs-commit-converge_dev.p2: Added 09/06/2010 00:42:03 by gdufour

Diffs-commit-converge_dev.p2: Added 09/06/2010 00:42:03 by gdufour
Can not view this . file attachment inline, please click on the following link to view the attachment.
http://cdetsweb-prd.cisco.com/apps/dumpcr_att?identifier=CSCti03084&title=Diffs-commit-converge_dev.p2&ext=&type=FILE

Diffs-commit-converge_dev.p2: Added 09/06/2010 00:42:03 by gdufour
Can not view this . file attachment inline, please click on the following link to view the attachment.
http://

[UnWrap text]  [Edit this enclosure]
Diffs-commit-converge_dev.p2: Added 09/06/2010 00:42:03 by gdufour

[Wrap Text]  [Edit this enclosure]
Diffs-commit-converge_dev.p2: Added 09/06/2010 00:42:03 by gdufour

[Uwrap text]  [Edit this enclosure]
Scimitar-code-review: Added  10/04/2010 07:12:07 by cms

Scimitar-code-review: Added  10/04/2010 07:12:07 by cms
Can not view this . file attachment inline, please click on the following link to view the attachment.
http://cdetsweb-prd.cisco.com/apps/dumpcr_att?identifier=CSCti03084&title=Scimitar-code-review&ext=&type=FILE

Scimitar-code-review: Added  10/04/2010 07:12:07 by cms
Can not view this . file attachment inline, please click on the following link to view the attachment.
http://

[UnWrap text]  [Edit this enclosure]
Scimitar-code-review: Added  10/04/2010 07:12:07 by cms

[Wrap Text]  [Edit this enclosure]
Scimitar-code-review: Added  10/04/2010 07:12:07 by cms

Another new feature that I introduced in version 4.1.x is CSCth64697: Enh: New show command to display http url/cookie hash result

WhatHow BadWhoVersions
Status   V-Verified
Founddevelopment
Original-foundde-test
ProjectCSC.datacenter
Productace
Componentcp_configmgr
Hardwareall
PriorityNA
Severity   enhancement (Sev6)
DE-priority3
RegressionN
Dev-escapeN
BadCodeFlagN
DE-manager magesh
DTPT-manager rajmishr
Engineer gdufour
Submitter gdufour
Assigner divjain
Code-reviewer bblanco 
Version3.0(0)A4(0.99.109)          
Integrated-releases3.0(0)A90(7.0.12)           3.0(0)A90(4.0.38)          3.0(0)A90(4.0.27)          3.0(0)A4(1.0)           3.0(0)A4(0.99.214)          3.0(0)A4(0.99.180)          
Verified-release3.0(0)A4(1.0.5)          
Component-fixed-versionFUSION
History
Histogram of CSCti03084: Covering 85 days and terminal, first Resolved in 3 days, currently in V-Verified state.

          N -> A [    0]  |
          A -> O [    0] |
          O -> R [    3] -O[3.53%]
          R -> A [   38]    ----------------R---------------[44.68%]
          A -> R [    0]                                    |
          R -> V [   44]                                    ------------------R-----------------[51.79%]
                          <-----------------------------------+----------------------------------->
                    26-Jul-2010                         06-Sep-2010                         19-Oct-2010
cdetsadm11/30/2010  15:15:24  Integrated-releases:-->  3.0(0)A90(7.0.12)
cdetsadm11/01/2010  13:42:20  Integrated-releases:--> 3.0(0)A4(1.0)
gdufour10/19/2010  00:22:06  RNE-Approval-Flg:N   -->
gdufour10/19/2010  00:22:06  Note Title:--> V-comments
gdufour10/19/2010  00:22:05  Verified-on:--> 10/19/2010 00:22:05
gdufour10/19/2010  00:22:05  Verifier:--> gdufour
gdufour10/19/2010  00:21:59  Verified-confidence:--> tested
gdufour10/19/2010  00:20:32  Status:R   --> V
gdufour10/19/2010  00:22:05  Verified-release:--> 3.0(0)A4(1.0.5)
cms10/04/2010  07:12:07  File Name:--> Scimitar-code-review
cdetsadm09/23/2010  10:30:36  Integrated-releases:-->  3.0(0)A90(4.0.38)
cdetsadm09/07/2010  11:35:29  Integrated-releases:-->  3.0(0)A4(0.99.214)
gdufour09/06/2010  00:56:13  RNE-Approval-Flg:N   -->
gdufour09/06/2010  00:56:13  Note Title:--> R-comments
gdufour09/06/2010  00:56:13  Resolved-on:07/29/2010 04:09:36   -->  09/06/2010 00:56:12
gdufour09/06/2010  00:55:57  Status:A   --> R
divjain09/06/2010  00:51:22  RNE-Approval-Flg:N   -->
divjain09/06/2010  00:51:22  Note Title:--> A-comments
divjain09/06/2010  00:51:22  Assigner:gdufour --> divjain
divjain09/06/2010  00:51:22  Assigned Date:07/26/2010 06:19:33   -->  09/06/2010 00:51:22
divjain09/06/2010  00:51:07  Status:R   --> A
gdufour09/06/2010  00:42:03  File Name:--> Diffs-commit-converge_dev.p2
cdetsadm08/25/2010  13:05:01  Integrated-releases:-->  3.0(0)A90(4.0.27)
rahul08/01/2010  23:52:20  Found:de-test   --> development
cdetsadm07/31/2010  07:09:49  Integrated-releases:-->  3.0(0)A4(0.99.180)
kavenkat07/29/2010  14:19:33  Attribute2:Fusion   --> FUSION
gdufour07/29/2010  04:09:36  Resolved-on:--> 07/29/2010 04:09:36
gdufour07/29/2010  04:09:28  Documents-changed:--> none
gdufour07/29/2010  04:09:28  Status:O   --> R
gdufour07/29/2010  04:05:46  File Name:--> Diffs-commit-converge_dev
gdufour07/28/2010  02:03:37  Regression:--> N
gdufour07/28/2010  02:03:47  Code-reviewer:--> bblanco
gdufour07/28/2010  02:02:52  Note:Static-analysis-runinfo --> Static-analysis-runinfo
gdufour07/28/2010  01:59:11  RNE-Approval-Flg:N   -->
gdufour07/28/2010  01:59:11  Note Title:--> Static-analysis-runinfo
gdufour07/26/2010  08:04:54  RNE-Approval-Flg:N   -->
gdufour07/26/2010  08:04:54  Note Title:--> AUT
gdufour07/26/2010  07:21:27  Status:A   --> O
cms07/26/2010  07:17:50  File Name:--> Scimitar-code-review
gdufour07/26/2010  06:24:00  RNE-Approval-Flg:N   -->
gdufour07/26/2010  06:24:00  Note Title:--> Unit-test
gdufour07/26/2010  06:19:54  Note Title:--> Release-note
gdufour07/26/2010  06:19:33  Assigner:--> gdufour
gdufour07/26/2010  06:19:33  Assigned Date:--> 07/26/2010 06:19:33
gdufour07/26/2010  06:19:33  Engineer:--> gdufour
gdufour07/26/2010  06:19:28  Status:N   --> A
gdufour07/26/2010  06:19:18  Defect Created:-->

Here is an example on how to use this command.

A sticky function very often hashes the input data into a 64bits value.  This is the value that we keep in our sticky entry and that we use to match new input data to existing entries.

So, when you display a sticky database, you usually a sticky-entry which is a huge number.

It's impossible to retrieve the original data from the hash value.

ctw1prod001/maxcon# show sticky  database  group group1
sticky group : group1
type         : HTTP-COOKIE
timeout      : 5             timeout-activeconns : TRUE
  sticky-entry          rserver-instance                 time-to-expire flags  
  ---------------------+--------------------------------+--------------+-------+
  18119923635054054134  rser3:0                          293            -
  5834233610907968705   rser2:0                          6              -

Until version 4.1.x it was also impossible to know the hash value from the original data.
This is why the command
show sticky hash is useful.  This command will hash whatever text you provide and return the hash value in hexadecimal and decimal. 

ctw1prod001/maxcon# show sticky hash cisco123
Hash: 0x50f75c8edad064c1 - 5834233610907968705

ctw1prod001/maxcon# show sticky hash testing123
Hash: 0xfb76e68ae89a0af6 - 18119923635054054134
ctw1prod001/maxcon#

In the example above, the text cisco123 is hashed into value 5834233610907968705
This value correspond to the sticky entry pointing to rser2:0

Below is another example with cookie insert.

ctw1prod001/maxcon# sh sticky  cookie-insert group group1
     Cookie   |        HashKey       |           rserver-instance  
  ------------+----------------------+----------------------------------------+
  R876364218 | 8016668853981260554  | s1/rser1:0
  R876365307 | 1555693097661300743  | s1/rser2:0
  R876366396 | 9600904050871858939  | s1/rser3:0
  R876367485 | 3122196043780649991  | s1/rser4:0
  R2830679934 | 17805359266603035579 | s2/rser4:0
  R2830681023 | 81717080676150073    | s2/rser5:0

ctw1prod001/maxcon# sh sticky hash  R876364218
Hash: 0x6f40edd93b7dab0a - 8016668853981260554

ctw1prod001/maxcon# sh sticky hash R2830681023
Hash: 0x12251407dc8ab39 - 81717080676150073
ctw1prod001/maxcon#

And this will end our stickyness review.

Do not hesitate to send me questions or topics you would like me to address.

Thanks,

Gilles Dufour

Average Rating: 0 (0 ratings)

Comments

mspradeep Sat, 12/11/2010 - 15:35

hi Gilles,

This ACE troubleshooting post is excellent.

Could you please also explain the following commands with sample example.

Capture (how to capture and analyse)

resource usage (to check the usage)

show conn address

regards,

PM

Actions

Login or Register to take actions

This Blog

Posted December 8, 2010 at 5:04 AM
Stats:

Related Content

Blogs Leaderboard