DBReplication Failure

Answered Question
Oct 12th, 2008
User Badges:

I checked the DBReplication Status of my 5.1.1 cluster (1 pub and 2 subs), and though it took a long time to create the file, there was no output in it. I then performed a "utils dbreplaction stop" on the sub and a "util dbreplication reset <subscriber2>" and let it run through. Ever since that point I have been getting a "IDSReplicationFailure class_id : CDR DEFINE SERVER command failed on the subscriber class_msg : replstate = 3 specific_msg : We are in the svc routine of ReplTask trying to setup replication on Subscriber AppID : Cisco Database Layer Monitor ClusterID : UofLCCM NodeID : <subscriber2> " If someone can tell me what I have done wrong and how to get replication working againj through my cluster that would be awesome. This all precipitated from an attemot to upgrade from 5.1.1 to 5.1.3, the publisher could see the patch file, but the subscribers couldn't. I had read that if replication has failed that it can cause this to happen. Any help would be greatly appreciated.

Correct Answer by allan.thomas about 8 years 8 months ago

The upgrade from 5.1.1 to 5.1.3 needs to be carried out initially on the Publisher before any subscriber, once the Publisher has been upgraded the Subscriber should then be able to see the upgrade image.


Can you confirm what the DB status is within RTMT, does is show the status of all nodes or only the Publisher and what the states are?


The sqlhosts file is present on each server and contains a reference for each Cisco Unified Communication Manager node in the cluster.


If those sqlhosts files are out of sync, the SQL replication fails. Use the show tech dbstateinfo CLI command in each subscriber in order to check the local sqlhosts at the

bottom of the output for any mismatch on each node.


If there are mismatches within this file these can only be modified through root access by TAC.


Rgds

Allan

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (1 ratings)
Loading.
allan.thomas Sun, 10/12/2008 - 07:59
User Badges:
  • Blue, 1500 points or more

The dbreplicate reset should be carried on the Publisher node not the Subscriber.


Run the same process again as follows:-


Initiate the utils dbreplicate stop on each Subscriber before the Publisher node. Ensure that the stop completes before proceeding to the next subscriber.


Once the dbreplicate stop has been carried on all Subscribers, then initate the same on the Publisher.


Only when the dbreplicate stop has completed on the publisher, should you run the 'utils dbreplication reset all'


HTH.

Allan.

keithknowles Sun, 10/12/2008 - 08:32
User Badges:

I have followed that procedure, but now I am getting that error for both subscribers. Is it possible I will get that until replication has finished??

allan.thomas Sun, 10/12/2008 - 09:47
User Badges:
  • Blue, 1500 points or more

The CDR Define Server error you are experiencing across both subscribers definately suggests a dbreplication failure.


In this instance there is an additional step that you should try before initiating a dbreplication reset from the publisher.


After stopping the dbreplication on both Subs and the Pub as before execute the following command in the same manor 'utils dbreplication dropadmindb'. Wait for it to complete on the each sub before the next, and then finally the Pub.


One this stage is complete on the Pub run the 'utils dbreplication reset all' from the Pub.


If you receive the following error, Enterprise Replication not active (62) after running the reset this is expected. The replication can take upto 30mins after.


If this still fails to correct the issue, then there maybe an underlying issue within the sqlhosts file which can only be changed via root access by TAC.


HTH.

Allan.

keithknowles Sun, 10/12/2008 - 11:56
User Badges:

I have tried the dropadmindb, but I am still getting the same error. I have also noticed that the publisher seems to think its the only node in the cluster, it tells me that when I run "utils dbreplication status".

Correct Answer
allan.thomas Sun, 10/12/2008 - 12:28
User Badges:
  • Blue, 1500 points or more

The upgrade from 5.1.1 to 5.1.3 needs to be carried out initially on the Publisher before any subscriber, once the Publisher has been upgraded the Subscriber should then be able to see the upgrade image.


Can you confirm what the DB status is within RTMT, does is show the status of all nodes or only the Publisher and what the states are?


The sqlhosts file is present on each server and contains a reference for each Cisco Unified Communication Manager node in the cluster.


If those sqlhosts files are out of sync, the SQL replication fails. Use the show tech dbstateinfo CLI command in each subscriber in order to check the local sqlhosts at the

bottom of the output for any mismatch on each node.


If there are mismatches within this file these can only be modified through root access by TAC.


Rgds

Allan

keithknowles Sun, 10/12/2008 - 12:33
User Badges:

My version of 5.1 doesnt have a "show tech dbstateinfo", is there an older version of this command?? This is the output of a "util dbreplication status".

SERVER ID STATE STATUS QUEUE CONNECTION CHANGED

-----------------------------------------------------------------------

g_batman_ccm 2 Active Local 0



Status cannot be reported for a cluster with a single active node; aborting status check operation

allan.thomas Sun, 10/12/2008 - 13:16
User Badges:
  • Blue, 1500 points or more

The command 'show perf query class "Number of Replicates Created and State of Replication"'


This will only show the status of the replication, as it would through RTMT. I assume it will only return the one entry.


Can you confirm that the Publisher is able reach either Subscriber both by IP address or Hostname?


You can verify this through the CLI using 'utils network host' as below, and 'show tech network hosts' CLI commands on all the cluster nodes:


'utils network host


Can you also post the output from the following command:-


'run sql select name,nodeid from ProcessNode'


It simply appears that the subscriber have not been added when they were first installed. I assume these servers are located in the admin pages under system/server?


Allan.

keithknowles Sun, 10/12/2008 - 13:25
User Badges:

==>query class :


- Perf class (Number of Replicates Created and State of Replication) has instances and values:

ReplicateCount -> Number of Replicates Created = 342

ReplicateCount -> Replicate_State = 2




'run sql select name,nodeid from ProcessNode'

================== ======

EnterpriseWideData 1

2

3

4

allan.thomas Sun, 10/12/2008 - 13:45
User Badges:
  • Blue, 1500 points or more

Curious, the replicate state for Publisher 2 and Subscriber1 show a status of 2 which is good. It seems that only Subscriber2 has broken replication?


Does the dbreplication status command still only return the publisher-node? I would expect to see both subs. Remember the reset could take upto 30mins.


Allan.


Allan.

keithknowles Sun, 10/12/2008 - 14:08
User Badges:

I am still gettng that error for Sub 1 too, even though its dbReplication status is 2??

allan.thomas Sun, 10/12/2008 - 14:23
User Badges:
  • Blue, 1500 points or more

Were you able to ping and resolve the hostname of each Subscriber from the Publisher?

keithknowles Sun, 10/12/2008 - 14:25
User Badges:

Absolutely. I am getting to my wits end here :) I should note, now the Replications Status of Sub 2 shows 0.

allan.thomas Sun, 10/12/2008 - 14:53
User Badges:
  • Blue, 1500 points or more

The symptoms you describe is similar to those of a previous replication failure, that status would alternate between 0 and 3.


As previously mentioned it was the sqlhosts files which were inconsistent and had to rectified by TAC.


Unfortunately, as far as I know there is no way of checking the sqlhost contents in CUCM 5.1.1, from 5.1.3 there is additional reports in CallManager which will provide the same information exhibited with the show tech dbstateinfo.


There are number of replication issue documented in 5.1.1. Pls advise what your exact version is?


Allan.

allan.thomas Sun, 10/12/2008 - 15:33
User Badges:
  • Blue, 1500 points or more

I have taken a look at all the documented dbreplication caveats with similiar symptoms to the one that you described, and there doesn't appear to be any exact matches for 5.1.1.3107-2 or earlier.


The only option available prior to raising a TAC case in my opinion, is to first attempt a 'utils dbreplication reset ' from each subscriber.


If this still fails to kickstart the replication, then perhaps consider restarting the cluster? and then raise a TAC case if symptoms still persist.


Rgds

Allan.

keithknowles Sun, 10/12/2008 - 15:37
User Badges:

If I run 'utils dbreplication reset ' from the sub I get "You have entered nodename as publisher node. You must enter subscriber node name.


Executed command unsuccessfully

/usr/local/cm/db/commands/utilsDB.sh: line 54: [-reset: command not found" even though i am entering a subscriber node name

allan.thomas Tue, 10/14/2008 - 11:51
User Badges:
  • Blue, 1500 points or more

Keith, has TAC been able to resolve this issue for you?


Rgds

Allan.

allan.thomas Tue, 10/14/2008 - 11:55
User Badges:
  • Blue, 1500 points or more

Could you share any feedback from Cisco if they are able to resolve the issue. It would be interesting to determine was the root cause was in the end.


Rgds

Allan.

a.gooding Thu, 04/15/2010 - 16:17
User Badges:
  • Bronze, 100 points or more

Was this every resolved. I have a customer that is experiencing the exact issue. Im working with TAC but its

taking a little longer than expected.


thanks

Tommer Catlin Mon, 01/12/2009 - 21:57
User Badges:
  • Green, 3000 points or more

Awesome worked for me. Rememeber to watch RTMT for the Replication status to change from 0 to 2 to 2 for all servers.


thanks!

Actions

This Discussion