cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1306
Views
10
Helpful
9
Replies

CSM not load-balancing properly

kasiddiq
Level 1
Level 1

Folks,

I'm seeing that CSM is not loadbalancing properly to all the servers in the serverfarm.

I simulated a failover of one of the servers TS05 and brought it back. After coming online, CSM sent no traffic to that server, as shown below:

C7606-1#show mod csm 1 serverfarms name WHTTP detail | i OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 2

TS02-RADIO, weight = 8, OPERATIONAL, conns = 4

TS03-RADIO, weight = 8, OPERATIONAL, conns = 2

TS04-RADIO, weight = 8, OPERATIONAL, conns = 4

TS05-RADIO, weight = 8, OPERATIONAL, conns = 0

TS06-RADIO, weight = 8, OPERATIONAL, conns = 1

C7606-1#show mod csm 1 serverfarms name WHTTP detail | i OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 1

TS02-RADIO, weight = 8, OPERATIONAL, conns = 3

TS03-RADIO, weight = 8, OPERATIONAL, conns = 1

TS04-RADIO, weight = 8, OPERATIONAL, conns = 3

TS05-RADIO, weight = 8, OPERATIONAL, conns = 0

TS06-RADIO, weight = 8, OPERATIONAL, conns = 1

C7606-1#show mod csm 1 serverfarms name WHTTP detail | i OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 2

TS02-RADIO, weight = 8, OPERATIONAL, conns = 4

TS03-RADIO, weight = 8, OPERATIONAL, conns = 1

TS04-RADIO, weight = 8, OPERATIONAL, conns = 3

TS05-RADIO, weight = 8, OPERATIONAL, conns = 0

TS06-RADIO, weight = 8, OPERATIONAL, conns = 1

C7606-1#show mod csm 1 serverfarms name WHTTP detail | i OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 2

TS02-RADIO, weight = 8, OPERATIONAL, conns = 1

TS03-RADIO, weight = 8, OPERATIONAL, conns = 0

TS04-RADIO, weight = 8, OPERATIONAL, conns = 2

TS05-RADIO, weight = 8, OPERATIONAL, conns = 0

TS06-RADIO, weight = 8, OPERATIONAL, conns = 2

C7606-1#show mod csm 1 serverfarms name WHTTP detail | i OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 2

TS02-RADIO, weight = 8, OPERATIONAL, conns = 1

TS03-RADIO, weight = 8, OPERATIONAL, conns = 1

TS04-RADIO, weight = 8, OPERATIONAL, conns = 2

TS05-RADIO, weight = 8, OPERATIONAL, conns = 0

TS06-RADIO, weight = 8, OPERATIONAL, conns = 3

C7606-1#show mod csm 1 serverfarms name WHTTP detail | i OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 2

TS02-RADIO, weight = 8, OPERATIONAL, conns = 1

TS03-RADIO, weight = 8, OPERATIONAL, conns = 1

TS04-RADIO, weight = 8, OPERATIONAL, conns = 2

TS05-RADIO, weight = 8, OPERATIONAL, conns = 0

TS06-RADIO, weight = 8, OPERATIONAL, conns = 4

--------

My config is attached with this post:

---

Can there be anything wrong with my VARIABLES?

C7606-1#show mod csm 1 variable | i SLOW

REAL_SLOW_START_ENABLE 1

C7606-1#

9 Replies 9

slow-start mode ensures that a newly operational server should received less number of new load-balanced connections than other servers. This would prevent the newly activate server from being overloaded with many consecutive new connections because of least opened connections selection method.

The configurable range for this variable is 0 to 10. The setting of 0 disables the slowstart feature. The value from 1 to 10 specifies how fast the newly activated server should ramp up. The value of 1 is the slowest ramp up rate.

I would suggest changing this variable value to 2 or 3(default value).

Syed

Syed - I will try the change you suggested. However, I didn't see any traffic going to the newly activated server (TS05). Even after an hour of testing.

But I'll change the value and see if it resolves the problem.

Thanks!

Kashif

I configured roundrobin for one serverfarm and it seems to work fine....

C7606-1#show mod csm 1 serverfarms name RADIUS detail | inc OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 2280

TS02-RADIO, weight = 8, OPERATIONAL, conns = 2276

TS03-RADIO, weight = 8, OPERATIONAL, conns = 2277

TS04-RADIO, weight = 8, OPERATIONAL, conns = 2273

TS05-RADIO, weight = 8, OPERATIONAL, conns = 2275

TS06-RADIO, weight = 8, OPERATIONAL, conns = 2272

C7606-1#show mod csm 1 serverfarms name RADIUS detail | inc OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 2316

TS02-RADIO, weight = 8, OPERATIONAL, conns = 2312

TS03-RADIO, weight = 8, OPERATIONAL, conns = 2313

TS04-RADIO, weight = 8, OPERATIONAL, conns = 2310

TS05-RADIO, weight = 8, OPERATIONAL, conns = 2312

TS06-RADIO, weight = 8, OPERATIONAL, conns = 2309

But the serverfarm with leastconns slowstart 60 is still not distributing traffic accordingly.

C7606-1#show mod csm 1 serverfarms name WHTTP detail | inc OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 801

TS02-RADIO, weight = 8, OPERATIONAL, conns = 0

TS03-RADIO, weight = 8, OPERATIONAL, conns = 0

TS04-RADIO, weight = 8, OPERATIONAL, conns = 0

TS05-RADIO, weight = 8, OPERATIONAL, conns = 0

TS06-RADIO, weight = 8, OPERATIONAL, conns = 0

C7606-1#show mod csm 1 serverfarms name WHTTP detail | inc OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 659

TS02-RADIO, weight = 8, OPERATIONAL, conns = 0

TS03-RADIO, weight = 8, OPERATIONAL, conns = 0

TS04-RADIO, weight = 8, OPERATIONAL, conns = 0

TS05-RADIO, weight = 8, OPERATIONAL, conns = 0

TS06-RADIO, weight = 8, OPERATIONAL, conns = 0

C7606-1#show mod csm 1 serverfarms name WHTTP detail | inc OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 739

TS02-RADIO, weight = 8, OPERATIONAL, conns = 0

TS03-RADIO, weight = 8, OPERATIONAL, conns = 0

TS04-RADIO, weight = 8, OPERATIONAL, conns = 0

TS05-RADIO, weight = 8, OPERATIONAL, conns = 0

TS06-RADIO, weight = 8, OPERATIONAL, conns = 0

Which Code are you running on CSM. There is a similar bug( CSCei26434 leastconn sending all new conns to a single server) which was fixed in 4.2(3a).

Syed Iftekhar Ahmed

I am running 4.2(3)

C7606-1#show mod version

Mod Port Model Serial # Versions

--- ---- ------------------ ----------- --------------------------------------

1 4 WS-X6066-SLB-APC SAD0744050P Hw : 1.7

Fw :

Sw : 4.2(3)

2 48 WS-X6748-GE-TX SAD0902013H Hw : 2.5

Fw : 12.2(14r)S5

Sw : 12.2(18)SXE6b

Sw1: 8.6(0.66)ROC16

WS-F6700-CFC SAD093302Y8 Hw : 2.0

3 48 WS-X6748-GE-TX SAD074803WL Hw : 1.2

Fw : 12.2(14r)S5

Sw : 12.2(18)SXE6b

Sw1: 8.6(0.66)ROC16

WS-F6700-CFC SAD07430896 Hw : 1.1

6 2 WS-SUP720-3BXL SAL09486VL2 Hw : 4.3

Fw : 8.4(2)

Sw : 12.2(18)SXE6b

Sw1: 8.6(0.66)ROC16

WS-SUP720 SAL09497EFG Hw : 2.3

Fw : 12.2(17r)S2

Sw : 12.2(18)SXE6b

WS-F6K-PFC3BXL SAL0949722R Hw : 1.6

So 4.2(3a) is upgrade for 4.2(3)?

CSM code, 4.2(3) was deferred and was removed from CCO. 4.2(3a) replaced it at that time.

Syed Iftekhar Ahmed

Folks - I upgraded the code to 4.2(6) and thought that bug CSCei26434 was solved, but still see the same error where CSM is not balancing properly.

No Traffic Running:

===================

C7606-1#show mod csm 1 serverfarms name WHTTP detail | inc OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 0

TS02-RADIO, weight = 8, OPERATIONAL, conns = 0

C7606-1#show mod csm 1 serverfarms name WHTTP detail | inc OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 0

TS02-RADIO, weight = 8, OPERATIONAL, conns = 0

C7606-1#show mod csm 1 serverfarms name WHTTP detail | inc OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 0

TS02-RADIO, weight = 8, OPERATIONAL, conns = 0

Started Traffic

===============

C7606-1#show mod csm 1 serverfarms name WHTTP detail | inc OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 60

TS02-RADIO, weight = 8, OPERATIONAL, conns = 0

C7606-1#show mod csm 1 serverfarms name WHTTP detail | inc OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 101

TS02-RADIO, weight = 8, OPERATIONAL, conns = 0

C7606-1#show mod csm 1 serverfarms name WHTTP detail | inc OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 188

TS02-RADIO, weight = 8, OPERATIONAL, conns = 0

The newly ALIVE server passed the probe test:

=============================================

C7606-1#show mod csm 1 probe detail | inc 102

192.168.122.102:1813 RADIUS RADIUS (default) OPERABLE

192.168.122.102:9202 WSP_SCL WSP_SCL (default) OPERABLE

192.168.122.102:9201 WSP_CO WSP_CO (default) OPERABLE

192.168.122.102:9200 WSP_CL WSP_CL (default) OPERABLE

192.168.122.102:8080 WHTTP WHTTP (default) OPERABLE

192.168.122.102:8080 TP_8080 TP_8080 (default) OPERABLE

192.168.122.102:7080 PUSH_WWW PUSH_WWW (default) OPERABLE

Still no connections:

=====================

C7606-1#show mod csm 1 serverfarms name WHTTP detail | inc OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 583

TS02-RADIO, weight = 8, OPERATIONAL, conns = 0

C76061#

Brought another server on-line:

================================

C7606-1#

1w4d: %CSM_SLB-6-RSERVERSTATE: Module 1 server state changed: SLB-NETMGT: TCP health probe re-activated server 192.168.122.106:8080 in serverfarm 'WHTTP'

1w4d: %CSM_SLB-6-RSERVERSTATE: Module 1 server state changed: SLB-NETMGT: TCP health probe re-activated server 192.168.122.106:8080 in serverfarm 'TP_8080'

1w4d: %CSM_SLB-6-RSERVERSTATE: Module 1 server state changed: SLB-NETMGT: TCP health probe re-activated server 192.168.122.106:7080 in serverfarm 'PUSH_WWW'

1w4d: %CSM_SLB-6-RSERVERSTATE: Module 1 server state changed: SLB-NETMGT: UDP health probe re-activated server 192.168.122.106:9200 in serverfarm 'WSP_CL'

1w4d: %CSM_SLB-6-RSERVERSTATE: Module 1 server state changed: SLB-NETMGT: UDP health probe re-activated server 192.168.122.106:1813 in serverfarm 'RADIUS'

1w4d: %CSM_SLB-6-RSERVERSTATE: Module 1 server state changed: SLB-NETMGT: UDP health probe re-activated server 192.168.122.106:9201 in serverfarm 'WSP_CO'

C7606-1#

C7606-1#

1w4d: %CSM_SLB-6-RSERVERSTATE: Module 1 server state changed: SLB-NETMGT: UDP health probe re-activated server 192.168.122.106:9202 in serverfarm '

No connections to the NEW Server even

======================================

C7606-1#show mod csm 1 serverfarms name WHTTP detail | inc OPER

TS01-RADIO, weight = 8, OPERATIONAL, conns = 2988

TS02-RADIO, weight = 8, OPERATIONAL, conns = 0

TS06-RADIO, weight = 8, OPERATIONAL, conns = 0

SW versions:

============

C7606-1#show mod | inc SLB

1 4 SLB Application Processor Complex WS-X6066-SLB-APC SAD0744050P

C7606-1#show mod

Mod Ports Card Type Model Serial No.

--- ----- -------------------------------------- ------------------ -----------

1 4 SLB Application Processor Complex WS-X6066-SLB-APC SAD0744050P

2 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAD0902013H

3 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAD074803WL

6 2 Supervisor Engine 720 (Active) WS-SUP720-3BXL SAL09486VL2

Mod MAC addresses Hw Fw Sw Status

--- ---------------------------------- ------ ------------ ------------ -------

1 000d.bcaf.d328 to 000d.bcaf.d32f 1.7 4.2(6) Ok

2 0011.bb2b.67d0 to 0011.bb2b.67ff 2.5 12.2(14r)S5 12.2(18)SXF7 Ok

3 000e.38c6.7d10 to 000e.38c6.7d3f 1.2 12.2(14r)S5 12.2(18)SXF7 Ok

6 0013.c42e.ee90 to 0013.c42e.ee93 4.3 8.4(2) 12.2(18)SXF7 Ok

Mod Sub-Module Model Serial Hw Status

---- --------------------------- ------------------ ----------- ------- -------

2 Centralized Forwarding Card WS-F6700-CFC SAD093302Y8 2.0 Ok

3 Centralized Forwarding Card WS-F6700-CFC SAD07430896 1.1 Ok

6 Policy Feature Card 3 WS-F6K-PFC3BXL SAL0949722R 1.6 Ok

6 MSFC3 Daughterboard WS-SUP720 SAL09497EFG 2.3 Ok

Mod Online Diag Status

---- -------------------

1 Pass

2 Pass

3 Pass

6 Pass

C7606-1#

CSM Variables:

==============

C7606-1#show mod csm 1 var

C7606-1#show mod csm 1 variable

variable value

----------------------------------------------------------------

ARP_INTERVAL 300

ARP_LEARNED_INTERVAL 14400

ARP_GRATUITOUS_INTERVAL 15

ARP_RATE 10

ARP_RETRIES 3

ARP_LEARN_MODE 1

ARP_REPLY_FOR_NO_INSERVICE_VIP 0

ADVERTISE_RHI_FREQ 10

AGGREGATE_BACKUP_SF_STATE_TO_VS 0

COOKIE_INSERT_EXPIRATION_DATE Fri, 1 Jan 2010 01:01:50 GMT

DEST_UNREACHABLE_MASK 0xffff

FT_FLOW_REFRESH_INT 0

FTP_CLOSE_DATA_CONN 0

GSLB_LICENSE_KEY (no valid license)

HTTP_CASE_SENSITIVE_MATCHING 1

HTTP_URL_COOKIE_DELIMITERS /?&#+

INBAND_STATE_CHANGED_MSG_RATE 4

INFINITE_IDLE_TIME_MAXCONNS 1024

MAX_PARSE_LEN_MULTIPLIER 1

NAT_CLIENT_HASH_SOURCE_PORT 0

variable value

----------------------------------------------------------------

NO_RESET_UNIDIRECTIONAL_FLOWS 0

REAL_SLOW_START_ENABLE 5

ROUTE_UNKNOWN_FLOW_PKTS 0

CSM_FAST_FIN_TIMEOUT 10

SASP_CSM_UNIQUE_ID Cisco-CSM

SASP_FIRST_BIND_ID 65520

SASP_GWM_BIND_ID_MAX 1

SASP_SCALE_WEIGHTS 0

SSL_DEFAULT_STICKY 0

SWITCHOVER_RP_ACTION 0

SWITCHOVER_SP_ACTION 0

SYN_COOKIE_INTERVAL 3

SYN_COOKIE_THRESHOLD 5000

TCP_ACCEPT_RST_EQU_NEXT_GET_SEQ 0

TCP_MSS_OPTION 1460

TCP_WND_SIZE_OPTION 8192

VSERVER_ICMP_ALWAYS_RESPOND false

XML_CONFIG_AUTH_TYPE Basic

MSTS_RDP_VIP_LIST

MAX_VSERVERS_PER_VIP 10

SECURE_HTTP_PORT 443

SECURE_HTTP_SSL_METHOD 0

SECURE_HTTP_TFTP_HOST_IPADDRESS

variable value

----------------------------------------------------------------

SECURE_HTTP_SERVER_CERTIFICATE

SECURE_HTTP_PRIV_KEY_FILE

SECURE_SASP_ENABLE 0

SECURE_SASP_SSL_METHOD 0

SECURE_SASP_TFTP_HOST_IPADDRESS

SECURE_SASP_SERVER_CERTIFICATE

SECURE_SASP_PRIV_KEY_FILE

NO_TIMEOUT_IP_STICKY_ENTRIES 0

MAX_COOKIE_SIZE 0

SASP_RETRY_COUNT 8

C7606-1#

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: