cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
245
Views
0
Helpful
1
Replies

Local Dir 417 with HTTP Probes fail to recover.

Having some trouble with the setup of a Local Director 417 using http probes to test the functionality of an underlying database.

The symptoms are as follows:

2 real servers and 1 virtual. Http-probes sent with a 20 second interval. If one real server is taken down - the LD keeps probing the server every 20 seconds and brings it up if/when the probe succeeds.

However - if BOTH the real servers are taken down - in a few seconds the virtual server is also marked as down (naturally). But here is the problem - as the virtual server is marked as down the LD stops sending HTTP probes onto the real servers, thereby making automatic recovery impossible....

In this state I can of course just bring one of the real servers up with the "in-service" command. The virtual server then comes up and it starts sending probes to BOTH servers again.

So to summarize my problem: How can I accomplish automatic recovery when both my real servers are taken down (lets say for backup) using HTTP probes?

Relevant config and debug below.

Best regards

Johan S

ip address 10.46.x.121 255.255.255.0

arp timeout 30

no rip passive

rip version 1

failover ip address 0.0.0.0

no failover

failover hellotime 30

password xxxxxxxxxxxxxxxx encrypted

virtual 10.46.x.120:0:0:tcp is

real 10.46.x.12:0:0:tcp is

real 10.46.x.13:0:0:tcp is

name 10.46.x.12 real1

name 10.46.x.13 real2

name 10.46.x.120 virtual

bind 10.46.x.120:0:0:tcp 10.46.x.12:0:0:tcp

bind 10.46.x.120:0:0:tcp 10.46.x.13:0:0:tcp

probe virtual 10.46.x.120:0:0:tcp http 2

probe real 10.46.x.12:0:0:tcp http 2

probe real 10.46.x.13:0:0:tcp http 2

probehttp virtual 10.46.x.120:0:0:tcp file /probe/probedb.html

probehttp real 10.46.x.12:0:0:tcp file /probe/probedb.html

probehttp real 10.46.x.13:0:0:tcp file /probe/probedb.html

probeconfig http 1

Some Debug:

Real Machines:

No Answer TCP Reset DataIn

Machine Connect State Thresh Reassigns Reassigns Conns

real1:0:0:tcp 0 IS 8 0 0 0

real2:0:0:tcp 2 IS 8 0 0 0

BOTH TAKEN DOWN

localdirector(config)# <163> July 8 17:17:02 LD-ERR Real machine 'real1:0:0:tcp': edited fr

om In service to Failed (External) -- Probe.

s<163> July 8 17:17:02 LD-ERR Real machine 'real2:0:0:tcp': edited from In serhvice to Fail

ed (External) -- Probe.

real

Real Machines:

No Answer TCP Reset DataIn

Machine Connect State Thresh Reassigns Reassigns Conns

real1:0:0:tcp 0 EFAILED 8 0 0 0

real2:0:0:tcp 1 EFAILED 8 0 0 0

A FEW SECONDS LATER THE VIRTUAL SERVER FAILS - NO MORE HTTP PROBES

localdirector(config)# <162> July 8 17:17:21 LD-CRIT Virtual machine 'virtual:0:0:tcp': Failed

.

localdirector(config)# sh real

Real Machines:

No Answer TCP Reset DataIn

Machine Connect State Thresh Reassigns Reassigns Conns

real1:0:0:tcp 0 EFAILED 8 0 0 0

real2:0:0:tcp 0 EFAILED 8 0 0 0

localdirector(config)# sh virt

Machines:

Machine Mode State Connect Sticky Predictor Slowstar

t

virtual:0:0:tcp directed local FAILED 0 0 leastconns* roundro

bin

NOW THE SERVERS ARE UP - BUT NO PROBES ARE SENT - THE REAL SERVERS REMAIN IN EFAILED (NOT IN "TESTING")

BRING ONE SERVER UP MANUALLY -THE PROBES CONTINUE - SERVICE RESTORED

localdirector(config)# is real real1

l<165> July 8 17:20:18 LD-NOTICE Real machine 'real1:0:0:tcp': Edited from Faioled (Externa

l) to In Service.

c<162> July 8 17:20:18 LD-CRIT Switching '10.46.x.120:0:0:tcp' from 'leastconnsa' to 'slowst

art'

ldirector(config)# <162> July 8 17:20:19 LD-CRIT Switching '10.46.x.120:0:0:tcp' from 'slows

tart' to 'leastconns'

<162> July 8 17:20:19 LD-CRIT Virtual machine 'virtual:0:0:tcp': Brought into service.

<163> July 8 17:20:42 LD-ERR Real machine 'real2:0:0:tcp': edited from Failed (External) to

Testing - HTTP probes.

<163> July 8 17:21:02 LD-ERR Real machine 'real2:0:0:tcp': edited from Testing to In Servic

e - HTTP probes.

1 Reply 1

I found the answer myself - so I'm posting this to help others who get stuck with the same problem.

Turns out this is a bug in version 4.2.1 and is resolved in 4.2.3.

Cheers

Johan

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: