css11503 keepalive always up?

Unanswered Question

Hi all,

I have one CSS11503 at the main datacentre and another at the standby datacentre (ie D/R scenario). The standby datacentre CSS I have configured so that if any DNS queires hit this site and the main data centre service is up, then prefer the main data centre (as apposed to standby).

But when testing, I suspended the service on the CSS in the main data centre, but the standby datacentre CSS still saw this service as 'alive', and therefore would not take over responsibility for the service.

I placed a sniffer on the standby datacentre CSS customer facing (APP port) vlan and could see keepalives being sent from this standby CSS to the main CSS service and the remote service still responding (even though I could not ping the main CSS service - because I had suspended it).

I then suspended the content on the main datacentre CSS and still the backup CSS saw this as alive and still got responses back from the main CSS service.

I have attached a config subset of both CSSs (ie one at each datacentre). Please note: I have configured for VRRP because at some stage we may have 2 at each data centre.

It appears to me like a bug. However, really am struggling so would appreciate some help if anybody has got any ideas.

Thanks in advance

regards

Mark

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
joe.arnstein Fri, 03/30/2007 - 07:14

Hi Mark,

In my CSS cluster, when changes are made to the primary CSS, they do not take effect on the secondary until you run

commit_VipRedundantConfig "local remote "

Is this how yours works?

Joe

joquesada Sat, 03/31/2007 - 00:03

Hi Mark,

Is there any other way to reach this site: http://webgeneral.nffs.ea.gov/general_webserver/images/home.gif; besides going to the content rule in the main site?

Can you change the configuration of the service General_hacked_redirect to make it look like this and try again?

service General_hacked_redirect

ip address 10.8.149.31

protocol tcp

port 7003

keepalive type http

keepalive method head

keeplive port 7003

active

Let me know how it goes. Thanks!

Regards,

Jose Quesada.

Hi Jose and Joe/Rich for responses - much appreciated.

I have tried your config and it worked fine.

But we have 2 services running off same Web server (ie one server at main datacentre and other server on standby site - D/R).

For each site, both services are accessed via the same real server IP address using ports 7002 and 7003 (ie one IP address/site).

The above solution worked fine for port 7003, but when using a similar config for port 7002,the service 'Forecaster_hacked_redirect' at the backup site was 'down' (when the service was up at the primary site).

I tried with and without a uri and Sniffer showed server was returning error codes of 404 or 500.

When I used the keepalive method below, the service at the standby site is always 'alive' even when I suspend the service at the primary site.

I don't understand whats happening!

The relevant config at the backup site is:

service Forecaster_hacked_redirect

ip address 10.8.149.41

protocol tcp

keepalive port 7002

port 7002

keepalive method get

keepalive type http

keepalive uri "webserver/loginpage.do"

active

The relevent config at the primary site is:

service NFFS_Webforecaster

ip address w.x.y.z

protocol tcp

port 7002

keepalive type http

keepalive port 7002

keepalive uri "webserver/loginpage.do"

keepalive method get

active

owner NFFS_Forecaster

dns both

content NFFS_Webforecaster

vip address 10.8.149.41

add service NFFS_Webforecaster

dnsbalance preferlocal

protocol tcp

port 7002

add dns webforecaster.nffs.ea.gov 5

active

To recap, when using a similar config to the other 'General' service, the service Forecaster_hacked_redirect always shows as 'down'. But when I use the config above, the service Forecaster_hacked_redirect always shows as 'alive'. When in this state, I placed a sniffer on standby site client port (APP port) and after suspending the service on the primary site, I could no longer ping the VIP address (this is what I would expect), but some how a conversation still takes place between the CSSs using a source address of the content rule I cannot now ping (ie 10.8.149.41).

The sniffer shows that the keepalive responses from the primary Web server are current (because it responses with date/time) and the source address at the primary site is the VIP address belonging to the content rule (which is down when I suspend the only real server associated with this rule), so why/how this happens I have know idea.

Have you any ideas why?

regards

mark

joe.arnstein Sat, 03/31/2007 - 08:47

Sorry, I misunderstood what you were asking, and it was my first post to the board too.

I agree with the last poster that you should try setting the keepalive port to match the service port. You can look in your packet sniff to verify whether keepalives are hitting port 80 or 7003.

Not all servers support "head" so maybe make the other change first and see if it worked.

joquesada Tue, 04/03/2007 - 15:31

Hi Mark,

For the server in the backup site running in port 7002, you might want to try using the head method in the keepalive. This due that the uri that you are requesting seems to be from a dynamic web page, and a get method would make the keepalive fail due to that the hash the CSS creates changes if the page changes.

Thanks & Regards,

Jose.

Actions

This Discussion