cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
755
Views
10
Helpful
11
Replies

CSM not loadbalancing correctly

scott-goodwin
Level 1
Level 1

Hi Guys,

The CSM seems to be ignoring one of my servers.

Alonso only has 13 connections, yet SCHUMACHER has 3200. Predictor is least connections yet Alonso seems to not be being used.

Any Ideas????

Config below;

module ContentSwitchingModule 2

ft group 1 vlan 108

priority 20 alt 10

preempt

!

vlan 106 client

ip address 10.x.x.6 255.255.255.0 alt 10.x.x.7 255.255.255.0

gateway 10.x.x.1

!

natpool CSM-PR2-USERS 10.x.x.10 10.x.x.18 netmask 255.255.255.0

!

probe ZEN-SERVER1 tcp

interval 10

failed 60

port 524

!

real ALONSO

address 10.x.x.61

inservice

real SCHUMACHER

address 10.x.x.34

inservice

serverfarm ZEN-SERVER1

nat server

nat client CSM-PR2-USERS

predictor leastconns

real name ALONSO

inservice

real name SCHUMACHER

inservice

probe ZEN-SERVER1

!

!

vserver VIP-ZENSERVER1

virtual 10.x.x.22 tcp 524

serverfarm ZEN-SERVER1

idle 43200

replicate csrp sticky

replicate csrp connection

persistent rebalance

inservice

Thanks in advance

Scott

1 Accepted Solution

Accepted Solutions

Scott,

this is normal.

Since there is no timeout, you can stay in this state forever.

The algorithm takes the average number of connections for all the servers and uses it as a weight. So, your server Alonso might have a weight of 1 and the other a weight of 1000.

What you can do is set a timeout of 1 hour and also change the speed factor.

to do this change the value of the var "REAL_SLOW_START_ENABLE".

The default is 3, you could move it to 2 or 1 to change how fast new connections are sent to the new server.

Gilles.

View solution in original post

11 Replies 11

Gilles Dufour
Cisco Employee
Cisco Employee

Scott,

if the Alons went down, it will enter the slowstart algorithm when coming back up and will slowly get connections.

You may want to configure leastconn with a maximum time for the slowstart algorithm so the server does not stay too long in this mode.

The timer is configured like this

gdufour-cat6k-2(config-slb-sfarm)#predictor leastconns slowstart ?

<1-65535> maximum slow-start expiry timer in secs

gdufour-cat6k-2(config-slb-sfarm)#predictor leastconns slowstart

To get back to normal, you can configure 'predictor roundrobin' and then immediately 'predirector leastonn slowstart '

Regards,

Gilles.

Hi Giles,

The server has been up for over 12hours now and still only has 9 connections? I this normal?

What would you suggest was a suitable timeout value?

Thanks for the speedy reply

Scott

Scott,

this is normal.

Since there is no timeout, you can stay in this state forever.

The algorithm takes the average number of connections for all the servers and uses it as a weight. So, your server Alonso might have a weight of 1 and the other a weight of 1000.

What you can do is set a timeout of 1 hour and also change the speed factor.

to do this change the value of the var "REAL_SLOW_START_ENABLE".

The default is 3, you could move it to 2 or 1 to change how fast new connections are sent to the new server.

Gilles.

Another place to look is on the server itself. We had a situation very similar and it turned out that the under-utilized server was actually forwarding connections back to the vip. It took a week to figure this out and it was like pulling teeth to get the developer to look at his server configuration.

Hi Gilles,

Yesterday the CSM was tested with the following configuration :

serverfarm PROXY_lot1

nat server

no nat client

predictor hash address source

real name PROXY01_lot1

inservice

real name PROXY02_lot1

inservice

probe PROXY_L4

probe PROXY_ICMP

serverfarm PROXY_lot2

nat server

no nat client

predictor hash address source

real name PROXY03_lot2

inservice

real name PROXY04_lot2

inservice

real name PROXY05_lot2

inservice

probe PROXY_L4

probe PROXY_ICMP

It happens that the serverfarm PROXY_lot1 was not balancing the connection as expected.

The first server got "0" connection while the second server got "3" connection.

We've stopped the second server and automatically the first server got "2" connections. But when the second server rebooted, that one stayed with "0" connection. Both seem to behave like in a active/standby relation.

We've changed the predictor mode using "roundrobin", but the connections still remained with one server to "0" connection. We put the mode back to "hash" and nothing changed.

Reading your mail, it seems that it might be not an anomaly.

How could I manage to show up that actually the load balancing algorythm is working fine ?

Thanks for your help.

Francois

Francois,

with a hash algorithm and few clients, you could have all your connections sent to the same server.

A hash is a basic function like even/odd. [it's more complex on the CSM, but it servers as an example]. So, all even ip addresses go to 1 server and all odd ip address to another server.

So, if you are using 3 clients with even ip address they will all go to 1 servers.

Anyway this is just an example.

If you want to test loadbalancing, you need many clients opening many connections. Like more than 100.

With 3 connections opened [by how many clients ?] you can't really say if there is an issue or not.

Gilles.

Thanks Gilles,

My working situation is the following one :

My CSM is directly connected on one side to 3 clients generating 100 tcp connections/s each, and on the other side, to the serverfarm composed of 2 servers holding those 300 connections.

The requirement is to configure the algorithm "hash address source" between the CSM and the servers.

Assuming that 2 clients are configured with odd addresses, and one client is configured with even address, the load on the 2 servers of the serverfarm should be in the proportion of 1/3 and 2/3.

I'm working on the capacity planning to identify what would be the CSM limiting factor in the following situations :

- If the total number of connections generated by the clients increases from 300 up to the sky ?

- if one the server of the serverfarm gets down ?

This etude would help me to identify

- what snmp traps should be monitored ?

- what would be the impact if the algorithm was reconfigured to "round robin" ?

I would greatly appreciate your comments.

Do you have suggestions regarding where

I could get detailled explanations about the following subjets that (correctly or not...) I've identified as possible limiting factors :

- max number of tcp sockets hold by the CSM ?

- configuration of the timers ?

- detailled explanation about the different algorithm ?

- snmp traps indicating the number of connections/s on each server of the serverfarm

Thanks for your help

Francois

Francois,

as I said, with 3 clients and hash address source, you may end up with all connections going to the same server.

The number of connections does not matter with hash address source. The csm only looks at the source ip address.

For your tests, you should either increase the number of clients or change to roundrobin.

The number of connections the CSM can handle is approximately 1 million concurrent connections.

There is not much snmp traps on the CSM.

If you want to monitor the number of connections there are snmp OID that can be used. I believe the question has been asked already, so you should be able to find it on this forum. The MIBS are also available on this website.

Gilles.

Thanks Gilles,

Is there a way to simulate the algorithm "hash source address" ?

My sources addresses are :

10.64.5.138

10.64.5.144

10.64.5.150

Thanks for your help

Francois

Francois,

unfortunateky there is no tool or easy function to compute the hash result.

Try to increase by 1 one of the ip address and see if it gets assigned to a different server.

Gilles.

Hi Gilles,

The new tests I made using the following client source addresses :

10.143.146.21

10.143.146.27

10.143.146.30

10.143.146.12

... result in a SINGLE server in the serverfarm targeted by the CSM (configured with "hash source address").

I've triggered a total of 36 connections during the test period, resulting in a max of 4 simultaneous connections (I got no tool to generate more simultaneous connections).

For any of the 36 connections, the same server in the serverfarm was targeted by the CSM.

Therefore, I suspect that the number of SIMULTANEOUS connections could influence the CSM load-balancing decision as well as the IP address source.

Tonight, my company launches in production, a trafic migration to the CSM (at that time, it's only in a lab test).

The number of SIMULTANEOUS connections will increase up to an (high/low?) indefined number.

Whateither is that number (high or low), is it reasonnable to expect the load balancing to target both servers of the serverfarm if the number of SIMULTANEOUS connection increase (knowing as I said in a previous reply, that there will be 3 source addresses only between the CSM and the serverfarm)?

Thanks for you reply.

Francois

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: