Cisco Support Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Community Member

HDS Replication Issue

I am working on a UCCE 7.2.5 environment with a Central Controller that is split into two sites. These are Roggers and there is a Distributor AW/HDS located at each site. I am not getting any Half Hour Call Type data on one of the HDS servers and the replication window on the Rogger is stating the following: The Historical records may have been deleted in teh Server database. Anyone have any ideas?

1 ACCEPTED SOLUTION

Accepted Solutions
Green

Re: HDS Replication Issue

OK, looking at the pictures it seems like you took these screenshots at about 3:50pm - the TCD and Event table show these time stamps.

On side A, the half-hour tables look correct on side A - all updated with the 3:00pm records. Side B looks incorrect. The TCD table seems up to date, but the half-hour tables are pretty old.

You can put trace on Replication on Logger B (use 0xff) and trace on Replication on AW/HDS2 (use 0xffff) and watch the trace. You should see that the recovery keys are out of whack. The HDS is asking for data that the logger does not yet have, or something like that.

The normal solution is to truncate the Recovery table on each of the HDS and the Logger after stopping the processes. This will allow replication to proceed correctly. No data is lost - it's all in the Loggers and they are sync'd - you just need to get the data over.

There are two approaches - the first may work, and the second may not be needed.

1. On AW/HDS2, stop the Distributor

2. From SQL Query Analyzer, issue "truncate table Recovery" against the xx_hds database

3. Start the Distributor

4. Watch the trace of the Replication process on the loggerB. You will see the requests come over. On the Distributor in the Replication trace you will see trace about OPERATIONS (inserts etc).

Use ICMDBA and look at those timestamps - hit refresh each time and you should see the tables come up to date.

If the trace shows this is not working, you need to do both distributor and logger. It's very important that you truncate the table AFTER stopping the process, or it will not work.

1. On AW/HDS2, stop the Distributor

2. On Rogger B, stop the Logger B

3. From SQL Query Analyzer, issue "truncate table Recovery" on logger B (against the xxx_sideB database)

4. On AW/HDS2, issue the same command against the xx_hds database

5. Start the logger

6. Start the distributor

7. Watch the trace of the Replication process on the loggerB and the trace of the Replication process on the Distributor. You will see all the operations occurring

Again, use ICMDBA to watch as the HDS comes back into line.

Regards,

Geoff

21 REPLIES
Green

Re: HDS Replication Issue

We need to do some analysis before we can sort out the problem. Let's make sure I understand what you have. AW/HDS1 has the preferred side set to side A and replicates from Rogger A. AW/HDS2 has the preferred side set to side B and replicates from Rogger B. Correct?

Use the icmdba tool on each Rogger and on each AW/HDS. Open the "Space Used" tab - on the loggers, select the "side" DB; on the AW/HDSs, select the "hds" databases.

We are not interested in the space used - we are interested in those timestamps.

The loggers should be in sync - if not, you have another problem. Most likely they will be fine.

From what you are saying, compare the last update time on the half hour records on each HDS. Look at the last update time for the Termination Call Detail tables.

Post back with your findings. Are you seeing one HDS showing what looks like correct timestamps, but the other showing incorrect timestamps - hasn't updated for a while.

Regards,

Geoff

Community Member

Re: HDS Replication Issue

You are correct. We have a Rogger and AW/HDS in each site and in both cases the AW/HDS replicates to its local Rogger. In looking through the timestamps on the Loggers everything appeared to match up correctly. I have attached the results from the HDS servers as that is where the inconsistency is. I noticed that not only are the timestamps mismatched, but it appears as if there are also some tables missing. Thank you very much for your quick response. I'd like to get this fixed ASAP so any help you can provide would be greatly appreciated.

Community Member

Re: HDS Replication Issue

One other thing to note...I just found the following article: http://www.cisco.com/en/US/products/sw/custcosw/ps1001/products_tech_note09186a00801357ff.shtml.

I realize that this article is from an older version of ICM however the switch listed in the article was not present on the HDS server we are having problems with. That being said, I did find this switch on the HDS server which is currently working. Is this switch relevant? If so, do I need to rebuild the HDS database?

Green

Re: HDS Replication Issue

OK, looking at the pictures it seems like you took these screenshots at about 3:50pm - the TCD and Event table show these time stamps.

On side A, the half-hour tables look correct on side A - all updated with the 3:00pm records. Side B looks incorrect. The TCD table seems up to date, but the half-hour tables are pretty old.

You can put trace on Replication on Logger B (use 0xff) and trace on Replication on AW/HDS2 (use 0xffff) and watch the trace. You should see that the recovery keys are out of whack. The HDS is asking for data that the logger does not yet have, or something like that.

The normal solution is to truncate the Recovery table on each of the HDS and the Logger after stopping the processes. This will allow replication to proceed correctly. No data is lost - it's all in the Loggers and they are sync'd - you just need to get the data over.

There are two approaches - the first may work, and the second may not be needed.

1. On AW/HDS2, stop the Distributor

2. From SQL Query Analyzer, issue "truncate table Recovery" against the xx_hds database

3. Start the Distributor

4. Watch the trace of the Replication process on the loggerB. You will see the requests come over. On the Distributor in the Replication trace you will see trace about OPERATIONS (inserts etc).

Use ICMDBA and look at those timestamps - hit refresh each time and you should see the tables come up to date.

If the trace shows this is not working, you need to do both distributor and logger. It's very important that you truncate the table AFTER stopping the process, or it will not work.

1. On AW/HDS2, stop the Distributor

2. On Rogger B, stop the Logger B

3. From SQL Query Analyzer, issue "truncate table Recovery" on logger B (against the xxx_sideB database)

4. On AW/HDS2, issue the same command against the xx_hds database

5. Start the logger

6. Start the distributor

7. Watch the trace of the Replication process on the loggerB and the trace of the Replication process on the Distributor. You will see all the operations occurring

Again, use ICMDBA to watch as the HDS comes back into line.

Regards,

Geoff

Green

Re: HDS Replication Issue

Oh, you should probably do this through a TAC case. Just to be sure.

Regards,

Geoff

Community Member

Re: HDS Replication Issue

Thank you very much for the fast response again. You have been very helpful. I'm not sure if you saw the article I sent but it was regarding the following registry key and its value:

KEY: HKEY_LOCAL_MACHINE\SOFTWARE\GeoTel\ICR\\Distributor\NodeManager\CurrentVersion\Processes\rpl\ImageArgs

Value:

/db /client/name /replicationport40026/recoveryport40028/all

The HDS I was having problems with was missing the /all switch at the end. I added the switch and then followed your process. I am collecting Half Hour data going forward however I am still having problems because it is trying to find a matching Recovery Point and can't. I believe this is because the /all switch needs to be present when the ICM services are initially started. Which means my last choice is to blow away the HDS database and then rebuild it with the /all switch already in place. If you have a second, let me know your thoughts.

Green

Re: HDS Replication Issue

Not sure why the /all switch is missing. I just checked an HDS here, and I certainly have it. Did your HDS install work perfectly? Is this a new setup - 7.2(5) so it's a modern code base.

It was my understanding that if you truncated the Recovery table, the Replication peer on the Distributor would not specify a recovery point - it passes over -1. The logger gives it everything it thinks it should, and the HDS sorts out what it needs to insert. If the Recovery table is truncated on both Logger and Distributor, I'm not quite sure how it knows what to add.

I did tackle a similar problem on 7.2(4) recently, so your post is interesting.

Regards,

Geoff

Community Member

Re: HDS Replication Issue

I'm with you...I have no idea why the /all switch would be missing. The /all switch was present on the Side A AW/HDS which is working as expected. This is a brand new 7.2.5 installation which just went live yesterday. The HDS installation was clean which makes this whole thing even more confusing. It looks like we are capturing data going forward however the Replication process on the Logger keeps showing messages that state, "No MATCHING Recovery Request for table ..." The message is constantly flashing in the window. Its like the system can't figure out how to get itself caught up. Which is why I'm thinking my only option is to delete the HDS database and recreate it. From there, my hope would be that the AW/HDS and Logger can sync up a Recovery Key and get all caught up.

Green

Re: HDS Replication Issue

I just thought I'd bump this to the top to see if you are watching the forum and have anything to add.

Regards,

Geoff

Community Member

Re: HDS Replication Issue

Thanks for following up Geoff. I did try and delete the database and recreate it however it did not help the problem. If I look at the Replication process on the Logger I am getting the following error message: The Historical records may have been deleted in the Server database. I have never seen anything like this before. One thing that did come up was the client made multiple changes to their WAN and I have now lost the Private connection between Side A and Side B. I need the client to resolve this issue before I can continue to troubleshoot so unfortunately I am currently in a holding pattern. Thanks again for checking up on this Geoff. You have been extremely helpful and I really appreciate all of your help. Will keep you posted on this issue going forward. Thanks again.

Green

Re: HDS Replication Issue

Hey John, it makes a bit more sense now that you say "now lost the Private connection between Side A and Side B" because, like you, I have never seen that scary message before. ;-)

Obviously, getting the synchronization all up to snuff is the number 1 thing to solve.

Regards,

Geoff

Community Member

Re: HDS Replication Issue

Hey Geoff, just wanted to let you know that once the client got the Private network all straightened out, everything fell back into place. Thank you so much for your help on this.

Green

Re: HDS Replication Issue

Good work mate.

The private network is the biggest problem UCCE customers have - they often will not obey the requirements for a number of reasons.

I've even seen them ask CCIE Routing and Switching Cisco engineers whether they need a totally separate private network, and they don't understand either.

Regards,

Geoff

Community Member

Re: HDS Replication Issue

Having is same problem, the distributor (configlogger) reporting to me that dbnextrow failed... do you have idea where to search it?

Community Member

Re: HDS Replication Issue

I would start with making sure that the central controllers are talking to each other. That was one of the issue that we ran into was the Roggers were unable to communicate with each other. There was also an issue where the HDS was having problems communicating with the Roggers. That would be the best place to start.

Cisco Employee

Re: HDS Replication Issue

Hi John,

just to clarify the message you were receiving. Most likely you had a different retention period set between the loggers and the HDS as it should.

The HDS was polling for data that the logger was no more holding as they have been purged.

Happy it is all back fine now.

Regards,

Riccardo

Community Member

Re: HDS Replication Issue

I am seeing the exact message same message  "The Historical records may have been deleted in the Server database" in  an ICM 6 -> 7 upgrade.  I tried the "truncate  table Recovery" with the processes stopped as described in the post but still see the issue.  What was the actual fix for this issue? It's a little unlcear after reading through all the posts whether is was the truncate command or the /all switch in the IMGARGS that fixed the problem.

Carlos

Community Member

Re: HDS Replication Issue

Carlos,

We had an issue where the HDS lost communication to the Logger it was configured to talk to for an extended period of time and when it came back online, the HDS was polling for data that the Logger no longer had.  Our Loggers were set to retain 14 days worth of data while the HDS was set to retain 365 days.  I believe once the HDS had completed pulling down all of the availablle data the error message went away.  I don't believe any of the fixes we tried actually resolved the issue.  Hope this helps.  Please let me know if there is anything else I can do to help you.

John

Green

Re: HDS Replication Issue

I see. That sounds like the right explanation.

Regards,

Geoff

Community Member

Re: HDS Replication Issue

John,

I know this is an old thread but I figured it will still be useful for others who run into the issue.  I think some of the things you did helped because I tried Geoff's recommendation again and this time it worked.  Maybe I forgot to stop the services the first time through (it was late).  After examining the live system and the new system it makes sense.  In the live system the "Recovery" tables on the HDS and the Rogger had matching recovery keys. In the new system the recovery keys did not match which is why the RGR was showing "The Historical records may have been deleted in the Server database".  This indicates that the Distributor can't find a matching recovery key on the RGR to start the replication process.

So Geoff's fix shown below worked for me:

1. On AW/HDS2, stop the Distributor

2. From SQL Query Analyzer, issue "truncate  table Recovery" against the xx_hds database

3. Start the Distributor

4. Watch the trace of the Replication process on the loggerB. You will see the requests come over. On the Distributor in the Replication trace you will see trace about OPERATIONS (inserts etc).

Here is a screenshot of the RGR replication process as the Distributor services are started after truncating the "Recovery" table on the Distributor

Carlos

Community Member

Re: HDS Replication Issue

That's great to hear that your issue is resolved Carlos.  Thanks for posting your experience...I went back and updated my own personal notes regarding this issue and hopefully it will help someone else out as well.  Credit to Geoff for supplying the proper fix.  Nice job as always Geoff!

5909
Views
5
Helpful
21
Replies
CreatePlease to create content