ACS upgrade: replication fails - shared secret mismatch?

Unanswered Question

Hi All,

Just did an upgrade from ACS 4.1.4.13 patch 17 to 4.1.4.13 patch 18. No problems on patch 17 using replication. After upgrade everything seemed to work fine until I checked internal database replication. Apparently replication now fails... The replication log on the slave doesn't mention anything about the replication attempt. However, on the master it starts with these messages:

05/11/2009 15:43:07 BARAHIR INFO Outbound replication cycle completed

05/11/2009 15:43:07 BARAHIR ERROR Database replication to ACS 'aredhel' aborted - shared secret mismatch

05/11/2009 15:43:06 BARAHIR INFO Component 'Logging Reports (Enable/Disable Settings)' was updated - being replicated to slave(s)

05/11/2009 15:43:06 BARAHIR INFO Component 'Password validation settings' was updated - being replicated to slave(s)

05/11/2009 15:43:06 BARAHIR INFO Component 'Interface Security Settings' was updated - being replicated to slave(s)

05/11/2009 15:43:06 BARAHIR INFO Component 'Interface Configuration' was updated - being replicated to slave(s)

05/11/2009 15:43:05 BARAHIR INFO Component 'Distribution Table' was updated - being replicated to slave(s)

05/11/2009 15:43:05 BARAHIR INFO Component 'Network Configuration Device tables' was updated - being replicated to slave(s)

05/11/2009 15:43:05 BARAHIR INFO Component 'User and Group Database' was updated - being replicated to slave(s)

05/11/2009 15:43:03 BARAHIR INFO Outbound replication cycle starting...

Again, before the patch install the whole replication process worked just fine. No typos in shared secrets, no wrong self IP adresses etc. Anyone seen this before? What could be wrong and what has changed in the replication process from patch 17 to 18?

Thanks in advance!

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (3 ratings)
Loading.
Jatin Katyal Thu, 11/05/2009 - 07:10

Hi,

I haven't seen this before but as we see shared secret mismatch. I think we should re-type the shared secret key :(

Also could you please get the auth.log file with logging set to full and corresponding time-stamp from the failed replication logs so that we can further investigate the issue.

HTH

JK

Plz rate helpful posts-

Jatin Katyal Thu, 11/05/2009 - 07:28

Hi,

Is this ACS SE or ACS windows?

If this is ACS for windows then

get the active auth.log file from the below mentioned path

C:\Program Files\CiscoSecure ACS v4.2\CSAuth\Logs

Otherwise, with ACS solution engine we need to generate the whole package.cab file:

go to System Configuration-----> Service Control under Services Log File Configuration, set the Level of detail to full. Then go to System configuration-----> Support.

In Details to collect, check the box for Collect log files only and And click Run Support Now. Then save the package.cab and attach it here.

HTH

JK

Plz rate helpful posts-

Tried it again with logging to 'full'. Ignore previous post, have been troubleshooting too long today. *sigh*

This is the corresponding "Database replication active.csv":

05/11/2009 16:26:46 BARAHIR INFO Outbound replication cycle completed

05/11/2009 16:26:45 BARAHIR ERROR Database replication to ACS 'aredhel' aborted - shared secret mismatch

05/11/2009 16:26:44 BARAHIR INFO Component 'Logging Reports (Enable/Disable Settings)' was updated - being replicated to slave(s)

05/11/2009 16:26:44 BARAHIR INFO Component 'Password validation settings' was updated - being replicated to slave(s)

05/11/2009 16:26:44 BARAHIR INFO Component 'Interface Security Settings' was updated - being replicated to slave(s)

05/11/2009 16:26:44 BARAHIR INFO Component 'Interface Configuration' was updated - being replicated to slave(s)

05/11/2009 16:26:44 BARAHIR INFO Component 'Distribution Table' was updated - being replicated to slave(s)

05/11/2009 16:26:44 BARAHIR INFO Component 'Network Configuration Device tables' was updated - being replicated to slave(s)

05/11/2009 16:26:44 BARAHIR INFO Component 'User and Group Database' was updated - being replicated to slave(s)

05/11/2009 16:26:43 BARAHIR INFO Outbound replication cycle starting...

The AUTH.log is attached to this post. Thanks again.

Attachment: 
Jatin Katyal Thu, 11/05/2009 - 07:47

Hi,

Going through auth.log file, i have found this

=======================================================

-attempting to sync with host aredhel

-using source ip address '145.24.16.40' to sync with ACS aredhel

-detected source ip address and losthost ip address are same, using host entry barahir

========================================================

Conclusion

Primary server : barahir, ip: 145.24.16.40

secondary server : aredhel

Possibilities of failed replication:

There could be two server entries for primary server on the primary server.

Go to network configuration > under the aaa server field > please check how many servers we have. if there are two entries under different NDG's then delete the one which is not in use.

"We tested ACS on computers that have only one network interface card."

Please make sure the server on which we have ACS installed, we only have one Physical Network Card enabled. Please disable all other network cards.

Then do the replication.

HTH

Jk

Plz rate helpful posts-

Hi! Thanks for the quick reply. I've made two screenshots. Please see the attachments. Barahir is indeed the primary. Aredhel is secondary. I have only on NIC on both machines with static IP addresses. Both machines are in the same subnet.

Both are running on a VMware ESX server. The OS is Windows server 2003, SP2, latest MS updates.

Jatin Katyal Thu, 11/05/2009 - 08:33

Hi,

The server entries looks perfect (strange behaviour)

Do we have any Anti Virus Software installed on Secondary ACS server, i.e. aredhe?

If yes, please disable the anti virus

software. If this action is not possible. Please try to prevent anti virus any kind of scanning on line/off line from

C:\Program Files\CiscoSecure ACS v4.1\

Do we have Hyper Threading or Multiprocessing enabled on the server

where ACS is installed?

If yes, please disable it and make sure that ACS server is running on a

single processor. And then test the replication.

HTH

JK

Plz rate helpful posts-

Hi,

There is no antivirus software installed on either machines. We've had problems with that before. It is actually a dedicated installation for ACS. No other software installed, apart from the HP Dataprotector backup agent.

VMWare ESX configuration for this host is for one CPU only, no HT enabled...

It is the strangest thing, after installation of patch 18, replication stopped working. I also tried installing patch 19 over patch 18 to see if that would help. Unfortunately, no luck there. Do you have any other options??

Cheers,

Vincent

Jatin Katyal Thu, 11/05/2009 - 09:09

Hi Vincent,

I think this need TAC treatment.

HTH

JK

Plz rate helpful posts-

Hi again,

Just got of the phone with TAC. A TAC engineer logged into our ACS systems and did some troubleshooting. His final conclusion is that our installation is 'unsupported'. According to TAC, we should be running ESX version 3.0.x as ACS has been tested on this platform.

We run ESX 3.5 and consequently our installation in unsupported. TAC suggested we downgrade to ESX 3.0.x.... I find this answer very disappointing because we haven't had any problems before with ACS (pre-patch 18) on ESX 3.5.

Formally Cisco is absolutely right. However, you can't expect customers to be running a 3+ year old version of ESX anymore. Especially when nearly every sales/marketing pitch of Cisco contains the word virtualization...

Personally, I think we're running into a bug in ACS and Cisco is hiding behind an excuse for whatever reason thinkable.

Jatin Katyal Fri, 11/06/2009 - 06:10

Hi,

No, there is no known bug with ACS. ESX 3.5.x is supported BUT NOT TESTED.

VMware ESX Server Support

ACS 4.1.4 has been tested on the VMware ESX server with the following configuration:

•VMWare ESX Server 3.0.0

•16 GB of RAM

•AMD Opteron Dual Core processor

•300 GB hard drive

•Four virtual machines

•Windows 2003 Standard Edition

•3 GB of RAM for the guest operating system

The following versions of VMware ESX are supported.

•ESX 3.0.x (tested)

•ESX 3.5.x (not tested)

•ESX 3.5i (not tested)

HTH

JK

Plz rate helpful posts-

Actions

This Discussion