10-17-2006 02:32 AM
Hi Guys,
The CSM seems to be ignoring one of my servers.
Alonso only has 13 connections, yet SCHUMACHER has 3200. Predictor is least connections yet Alonso seems to not be being used.
Any Ideas????
Config below;
module ContentSwitchingModule 2
ft group 1 vlan 108
priority 20 alt 10
preempt
!
vlan 106 client
ip address 10.x.x.6 255.255.255.0 alt 10.x.x.7 255.255.255.0
gateway 10.x.x.1
!
natpool CSM-PR2-USERS 10.x.x.10 10.x.x.18 netmask 255.255.255.0
!
probe ZEN-SERVER1 tcp
interval 10
failed 60
port 524
!
real ALONSO
address 10.x.x.61
inservice
real SCHUMACHER
address 10.x.x.34
inservice
serverfarm ZEN-SERVER1
nat server
nat client CSM-PR2-USERS
predictor leastconns
real name ALONSO
inservice
real name SCHUMACHER
inservice
probe ZEN-SERVER1
!
!
vserver VIP-ZENSERVER1
virtual 10.x.x.22 tcp 524
serverfarm ZEN-SERVER1
idle 43200
replicate csrp sticky
replicate csrp connection
persistent rebalance
inservice
Thanks in advance
Scott
Solved! Go to Solution.
10-17-2006 06:52 AM
Scott,
this is normal.
Since there is no timeout, you can stay in this state forever.
The algorithm takes the average number of connections for all the servers and uses it as a weight. So, your server Alonso might have a weight of 1 and the other a weight of 1000.
What you can do is set a timeout of 1 hour and also change the speed factor.
to do this change the value of the var "REAL_SLOW_START_ENABLE".
The default is 3, you could move it to 2 or 1 to change how fast new connections are sent to the new server.
Gilles.
10-17-2006 02:37 AM
Scott,
if the Alons went down, it will enter the slowstart algorithm when coming back up and will slowly get connections.
You may want to configure leastconn with a maximum time for the slowstart algorithm so the server does not stay too long in this mode.
The timer is configured like this
gdufour-cat6k-2(config-slb-sfarm)#predictor leastconns slowstart ?
<1-65535> maximum slow-start expiry timer in secs
gdufour-cat6k-2(config-slb-sfarm)#predictor leastconns slowstart
To get back to normal, you can configure 'predictor roundrobin' and then immediately 'predirector leastonn slowstart
Regards,
Gilles.
10-17-2006 03:13 AM
Hi Giles,
The server has been up for over 12hours now and still only has 9 connections? I this normal?
What would you suggest was a suitable timeout value?
Thanks for the speedy reply
Scott
10-17-2006 06:52 AM
Scott,
this is normal.
Since there is no timeout, you can stay in this state forever.
The algorithm takes the average number of connections for all the servers and uses it as a weight. So, your server Alonso might have a weight of 1 and the other a weight of 1000.
What you can do is set a timeout of 1 hour and also change the speed factor.
to do this change the value of the var "REAL_SLOW_START_ENABLE".
The default is 3, you could move it to 2 or 1 to change how fast new connections are sent to the new server.
Gilles.
10-17-2006 02:24 PM
Another place to look is on the server itself. We had a situation very similar and it turned out that the under-utilized server was actually forwarding connections back to the vip. It took a week to figure this out and it was like pulling teeth to get the developer to look at his server configuration.
02-14-2007 02:19 AM
Hi Gilles,
Yesterday the CSM was tested with the following configuration :
serverfarm PROXY_lot1
nat server
no nat client
predictor hash address source
real name PROXY01_lot1
inservice
real name PROXY02_lot1
inservice
probe PROXY_L4
probe PROXY_ICMP
serverfarm PROXY_lot2
nat server
no nat client
predictor hash address source
real name PROXY03_lot2
inservice
real name PROXY04_lot2
inservice
real name PROXY05_lot2
inservice
probe PROXY_L4
probe PROXY_ICMP
It happens that the serverfarm PROXY_lot1 was not balancing the connection as expected.
The first server got "0" connection while the second server got "3" connection.
We've stopped the second server and automatically the first server got "2" connections. But when the second server rebooted, that one stayed with "0" connection. Both seem to behave like in a active/standby relation.
We've changed the predictor mode using "roundrobin", but the connections still remained with one server to "0" connection. We put the mode back to "hash" and nothing changed.
Reading your mail, it seems that it might be not an anomaly.
How could I manage to show up that actually the load balancing algorythm is working fine ?
Thanks for your help.
Francois
02-14-2007 02:32 AM
Francois,
with a hash algorithm and few clients, you could have all your connections sent to the same server.
A hash is a basic function like even/odd. [it's more complex on the CSM, but it servers as an example]. So, all even ip addresses go to 1 server and all odd ip address to another server.
So, if you are using 3 clients with even ip address they will all go to 1 servers.
Anyway this is just an example.
If you want to test loadbalancing, you need many clients opening many connections. Like more than 100.
With 3 connections opened [by how many clients ?] you can't really say if there is an issue or not.
Gilles.
02-15-2007 07:53 PM
Thanks Gilles,
My working situation is the following one :
My CSM is directly connected on one side to 3 clients generating 100 tcp connections/s each, and on the other side, to the serverfarm composed of 2 servers holding those 300 connections.
The requirement is to configure the algorithm "hash address source" between the CSM and the servers.
Assuming that 2 clients are configured with odd addresses, and one client is configured with even address, the load on the 2 servers of the serverfarm should be in the proportion of 1/3 and 2/3.
I'm working on the capacity planning to identify what would be the CSM limiting factor in the following situations :
- If the total number of connections generated by the clients increases from 300 up to the sky ?
- if one the server of the serverfarm gets down ?
This etude would help me to identify
- what snmp traps should be monitored ?
- what would be the impact if the algorithm was reconfigured to "round robin" ?
I would greatly appreciate your comments.
Do you have suggestions regarding where
I could get detailled explanations about the following subjets that (correctly or not...) I've identified as possible limiting factors :
- max number of tcp sockets hold by the CSM ?
- configuration of the timers ?
- detailled explanation about the different algorithm ?
- snmp traps indicating the number of connections/s on each server of the serverfarm
Thanks for your help
Francois
02-16-2007 12:27 AM
Francois,
as I said, with 3 clients and hash address source, you may end up with all connections going to the same server.
The number of connections does not matter with hash address source. The csm only looks at the source ip address.
For your tests, you should either increase the number of clients or change to roundrobin.
The number of connections the CSM can handle is approximately 1 million concurrent connections.
There is not much snmp traps on the CSM.
If you want to monitor the number of connections there are snmp OID that can be used. I believe the question has been asked already, so you should be able to find it on this forum. The MIBS are also available on this website.
Gilles.
02-17-2007 11:05 PM
Thanks Gilles,
Is there a way to simulate the algorithm "hash source address" ?
My sources addresses are :
10.64.5.138
10.64.5.144
10.64.5.150
Thanks for your help
Francois
02-20-2007 12:07 PM
Francois,
unfortunateky there is no tool or easy function to compute the hash result.
Try to increase by 1 one of the ip address and see if it gets assigned to a different server.
Gilles.
02-21-2007 01:54 AM
Hi Gilles,
The new tests I made using the following client source addresses :
10.143.146.21
10.143.146.27
10.143.146.30
10.143.146.12
... result in a SINGLE server in the serverfarm targeted by the CSM (configured with "hash source address").
I've triggered a total of 36 connections during the test period, resulting in a max of 4 simultaneous connections (I got no tool to generate more simultaneous connections).
For any of the 36 connections, the same server in the serverfarm was targeted by the CSM.
Therefore, I suspect that the number of SIMULTANEOUS connections could influence the CSM load-balancing decision as well as the IP address source.
Tonight, my company launches in production, a trafic migration to the CSM (at that time, it's only in a lab test).
The number of SIMULTANEOUS connections will increase up to an (high/low?) indefined number.
Whateither is that number (high or low), is it reasonnable to expect the load balancing to target both servers of the serverfarm if the number of SIMULTANEOUS connection increase (knowing as I said in a previous reply, that there will be 3 source addresses only between the CSM and the serverfarm)?
Thanks for you reply.
Francois
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: