Datacenter troubleshooting guide - day 7

Blog

Jan 5, 2011 4:21 AM
Jan 5th, 2011

"Datacenter troubleshooting guide” – a blog by  Gilles Dufour.

Day 7 - Understanding me-sats (continued)

Let me start by wishing you all a Happy new Year !

In my previous post I talked about the undocumented but very useful ME-STATS.

Today I'm going to continue the discussion and look at stats for the ICM module.

After the RX module buffered the packet, and Fastpath decided it does not match an existing connection, the packet is sent to the Inbound Connection Manager (ICM) which will perform the L3 rule selection and decides if the packet should be sent to the LB module for loadbalancing or to the TCP module if we need to spoof the connection, or to the OCM module if the packet should be nated.

Since the number of stats for this command is huge, I removed some of them from the list below.

switch/Admin# show np 1 me-stats "-sicm -v"
ICM Statistics (Current)
--------------
Errors:                                           0             0
Frames Received:                             695245             5
Drop [unknown msg]:                               0             0
IPCP Received:                                    5             0
Embryonic Hit Received:                           0             0
Close Receive:                              1362835             4
Close Drop unknown msg:                           0             0
Close Errors:                                     0             0
Close Connection timeout:                      3241             1
Close IPCP send stat:                             0             0
Close IPCP recv stat:                             0             0
Encaps Miss Success stat:                         0             0
Encaps Miss Error stat:                           0             0

Close No interface on connection:                 0             0
Close connection [Interface down]:                0             0

......
Reuse link update conn not on reuse erro          0             0
Reuse conn remove not on head error:              0             0
Drop [Next-Hop queue full]:                       0             0
Close Error not in hash:                          0             0
Invalid reap messages:                            0             0
If lookup error:                                  5             0
...
UDP Chaser sent, conn miss:                       0             0
UDP Chaser sent, partial conn:                    0             0
(Context ALL Statistics)
Transmit -> fastpath:                         13696             3
Transmit -> TCP:                              94582             0
Transmit -> OCM:                               1096             0
Send   -> LB_L4:                             580982             0

Send -> Other IXP:                              202             0
Drop [redundant]:                                 0             0
Drop [ACL deny]:                               4772             2
Drop [Connection RL]:                             0             0
Drop [CP Connection RL]:                          0             0
Drop [Proxy RL]:                                  0             0
Drop [SSL RL]:                                    0             0
Drop [Connection Rate RL]:                        0             0
Drop [Inspect Rate RL]:                           0             0
Drop [IF FT Standby]:                             0             0

Drop [ICMP Hard Error]:                           0             0
Drop [ICMP Redirect]:                             0             0
Drop [ICMP Error IP Mismatch]:                    0             0
Connection [Inserts]:                        688907             2
Connection [Deletes]:                        784574             4
Connection [Modifies]:                            0             0
Proxy [Inserts]:                                  0             0
Proxy [Deletes]:                             772272             0
IPCP Sent:                                        5             0
CP Init Received:                              5925             1
Invalid conn miss TCP flags:                      0             0
RPF check Error:                                  0             0
Route lookup Error:                               0             0
MAC Lookup Error:                                 4             0
My mac check Error:                             108             0
Bridged - My mac Error                            0             0
BVI invalid/down Error                            0             0
Classify Error:                                   0             0
Transmit Encap Miss Msg stat:                   181             0
Drop [Encap Miss Msg stat]:                       0             0
Close Connection with invalid proxy:              0             0
Pinhole deletes:                                  0             0
Tracker Unlinks :                                 0             0
Connection Reuse Add Errors:                      0             0
Connections Removed From Reuse Pools:             0             0
Connections Added To Reuse Pools:                 0             0
Replicate Connection encap lookup error:          0             0
Replicate Connection MAC lookup error:            0             0
Replicate connection sent:                        0             0
Replicate connection msg to other ixp:            0             0
Replicate connection recv L4:                     0             0
Replicate connection recv LB:                     0             0
Replicate connection recv buddy:                  0             0
Drop [Replicate conn buddy - no control           0             0
Close IPCP errors:                                0             0
Close connection tracker not found error          0             0

As you can see, there are counters for each Send/Transmit destinations.  TCP and LB_L4 are considered "slow path" since we do not know yet the final destination of the packet.

Just like inside FastPath there is a "Drop [Next-Hop  queue full]" counter which indicate if other destinations are too slow processing their input queue preventing ICM to transmit new packets.  Those packets get dropped.

The two encaps counters ("Encaps  Miss Success stat" and "Encaps Miss  Error stat" ) are related to an interesting behavior of the ACE platform.

Like its predecessor the CSM - (Content Switching Module), ACE uses "encap ids" internally to reference mac-addresses.

Therefore, internally, a connection entry will be using encap ids which reference specific mac-addresses.

You can see the mapping between an encap id and a mac-address by doing a 'show arp'.

switch/Admin# show arp


Context Admin
================================================================================
IP ADDRESS      MAC-ADDRESS        Interface  Type      Encap  NextArp(s) Status
================================================================================
10.86.213.206   00.07.4f.ce.d6.00  vlan10    LEARNED    30     8999 sec     up
10.86.213.250   00.c0.9f.4f.fe.d1  vlan10    LEARNED    19     8995 sec     up
161.44.248.127  00.0b.fc.fe.1b.64  vlan10    NAT        LOCAL     _         up
10.86.213.1     00.00.0c.07.ac.00  vlan10    GATEWAY    20     297 sec      up
10.86.213.2     00.11.5d.e1.2f.fc  vlan10    LEARNED    22     8995 sec     up
10.86.213.16    00.0a.8a.7d.5f.38  vlan10    LEARNED    43     9631 sec     up
10.86.213.38    00.09.b6.92.36.80  vlan10    LEARNED    11     8994 sec     up
10.86.213.40    00.30.f2.75.f3.f1  vlan10    INTERFACE  LOCAL     _         up
10.86.213.53    00.0b.fc.fe.1b.64  vlan10    VSERVER    LOCAL     _         up
10.86.213.54    00.0b.fc.fe.1b.64  vlan10    NAT        LOCAL     _         up
10.10.20.1      00.0b.fc.fe.1b.64  vlan30    NAT        LOCAL     _         up
- 10.10.20.100
10.1.1.1        00.0b.fc.fe.1b.64  vlan30    NAT        LOCAL     _         up
192.168.30.10   00.1b.24.65.af.66  vlan30    LEARNED    39     9028 sec     up
192.168.30.11   00.1b.24.4d.eb.a6  vlan30    LEARNED    37     9024 sec     up
192.168.30.17   00.e0.81.22.78.ed  vlan30    STATIC     9         _         up

In the output above, we can see that mac-address 00.07.4f.ce.d6.00 is mapped to encap id 30.

When a packet comes to ICM from an unknown mac-address an internal (IPCP) message is sent to the Control Plane so that we can trigger an arp request to populate the arp table and obtain an encap id for the new mac-address.

When this process succeeds we increment "Encaps Miss Success stat", but when it fails the packet is dropped and the counter "Encaps Miss Error stat" is incremented.

One reason for getting an encap miss error is when you reach the limit of mac miss rate.

You can check your current rate and the limit with the following command.

switch/Admin# show resource usage | i mac
  mac-miss rate                 1          5          0        700          0

Now, let's examine the DROP counters.

We have a serie of RL (Resource Limit) counters for all the resources that we monitor.  This includes the "concurrent connections", the "SSL connections", "proxy connections", "CP concurrent connections", ...

We also drop new connections when we are in standby mode "Drop [IF FT  Standby]" or when traffic matched a deny ACL "Drop [ACL  deny]".

A more interesting counter is the "Drop  [redundant]" one.

It has nothing to do with fault tolerance and redundancy.

This counter actually means we received a new connection request for a connection that is already being processed.

This can usually happen for very fast UDP traffic.

The first UDP packet requires ICM to create a new connection so that further UDP packets for that same flow can be fast switched.

If the next packet comes in before ICM is done processing the first packet, it is dropped and this counter is incremented.

For very fast UDP traffic, if you have problems with redundant drops, you should consider enabling UDP Booster.  I'll cover this feature in a future post.

Next module is TCP but it will be for my next post.

I hope you'll find this information useful.

Thanks.

Gilles Dufour.

Average Rating: 5 (1 ratings)

Actions

Login or Register to take actions

This Blog

Posted January 5, 2011 at 4:21 AM
Stats:

Related Content

Blogs Leaderboard