Database replication and CSAuth service

latintrpt · ‎03-31-2010

Hello everyone,

We had a scare yesterday. We had a wireless outage yesterday and we came to find out that it was because the CSauth service was not running on both our ACS's. New user's or user's trying to re-autenticate were unable to. We have 2 ACS's that authenticate user's simultaneously. The primary server replicates certain components every 60 minutes to the the secondary server. So primary sends and secondary receives. Primay sends every 60 mins and secondary is set to manully receive. Could this be causing a problem, should i set the replication on the primary to a higher time and the secondary set to a specific amount of minutes also?

When I replicate, does CSAuth stop for both the sender and receiver at the same time? I'm running Release 4.1(1) Build 23 on both ACS's. I just don't know why both were stopped at the same time and didn't come back up. I have a feeling it's the replication but can't pinpoint if it's a setting issue.

Thank You

darpotter · ‎04-01-2010

Here's how replication works...

On the master, the csauth service will lock out request traffic from its local radius, tacacs & admin service. It will then collate the fileset ready for replication and then start responding to requests again. It will then (in the background) work its way through the list of slaves by firstly sending the replication fileset and then instructing the slave the accept the replication.

On the slave the replication fileset is received as a background task and only when its complete will it lock out the radius, tacacs & admin services while it copies the fileset into place.

So the master and slave are never out of operation at the same time.

The master will only lock out once regardless of how many slaves it replicates to

The lock out period is only as long as it takes to physically copy and compress the replication fileset

Hourly replication seems quite aggressive, I think I really intended for daily when it was written.

Failover timeouts on the AAA Clients should be 30seconds or more. There's always the danger that a device times out on the master, flips over the to slave just as that is getting the replication. Perhaps you can check the replication logs on the master and slave to see how long replication is taking. This might give you some idea about timings on failover.