cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
11393
Views
5
Helpful
21
Replies

HDS Replication Issue

john.miccio
Level 1
Level 1

I am working on a UCCE 7.2.5 environment with a Central Controller that is split into two sites. These are Roggers and there is a Distributor AW/HDS located at each site. I am not getting any Half Hour Call Type data on one of the HDS servers and the replication window on the Rogger is stating the following: The Historical records may have been deleted in teh Server database. Anyone have any ideas?

1 Accepted Solution

Accepted Solutions

OK, looking at the pictures it seems like you took these screenshots at about 3:50pm - the TCD and Event table show these time stamps.

On side A, the half-hour tables look correct on side A - all updated with the 3:00pm records. Side B looks incorrect. The TCD table seems up to date, but the half-hour tables are pretty old.

You can put trace on Replication on Logger B (use 0xff) and trace on Replication on AW/HDS2 (use 0xffff) and watch the trace. You should see that the recovery keys are out of whack. The HDS is asking for data that the logger does not yet have, or something like that.

The normal solution is to truncate the Recovery table on each of the HDS and the Logger after stopping the processes. This will allow replication to proceed correctly. No data is lost - it's all in the Loggers and they are sync'd - you just need to get the data over.

There are two approaches - the first may work, and the second may not be needed.

1. On AW/HDS2, stop the Distributor

2. From SQL Query Analyzer, issue "truncate table Recovery" against the xx_hds database

3. Start the Distributor

4. Watch the trace of the Replication process on the loggerB. You will see the requests come over. On the Distributor in the Replication trace you will see trace about OPERATIONS (inserts etc).

Use ICMDBA and look at those timestamps - hit refresh each time and you should see the tables come up to date.

If the trace shows this is not working, you need to do both distributor and logger. It's very important that you truncate the table AFTER stopping the process, or it will not work.

1. On AW/HDS2, stop the Distributor

2. On Rogger B, stop the Logger B

3. From SQL Query Analyzer, issue "truncate table Recovery" on logger B (against the xxx_sideB database)

4. On AW/HDS2, issue the same command against the xx_hds database

5. Start the logger

6. Start the distributor

7. Watch the trace of the Replication process on the loggerB and the trace of the Replication process on the Distributor. You will see all the operations occurring

Again, use ICMDBA to watch as the HDS comes back into line.

Regards,

Geoff

View solution in original post

21 Replies 21

geoff
Level 10
Level 10

We need to do some analysis before we can sort out the problem. Let's make sure I understand what you have. AW/HDS1 has the preferred side set to side A and replicates from Rogger A. AW/HDS2 has the preferred side set to side B and replicates from Rogger B. Correct?

Use the icmdba tool on each Rogger and on each AW/HDS. Open the "Space Used" tab - on the loggers, select the "side" DB; on the AW/HDSs, select the "hds" databases.

We are not interested in the space used - we are interested in those timestamps.

The loggers should be in sync - if not, you have another problem. Most likely they will be fine.

From what you are saying, compare the last update time on the half hour records on each HDS. Look at the last update time for the Termination Call Detail tables.

Post back with your findings. Are you seeing one HDS showing what looks like correct timestamps, but the other showing incorrect timestamps - hasn't updated for a while.

Regards,

Geoff

You are correct. We have a Rogger and AW/HDS in each site and in both cases the AW/HDS replicates to its local Rogger. In looking through the timestamps on the Loggers everything appeared to match up correctly. I have attached the results from the HDS servers as that is where the inconsistency is. I noticed that not only are the timestamps mismatched, but it appears as if there are also some tables missing. Thank you very much for your quick response. I'd like to get this fixed ASAP so any help you can provide would be greatly appreciated.

One other thing to note...I just found the following article: http://www.cisco.com/en/US/products/sw/custcosw/ps1001/products_tech_note09186a00801357ff.shtml.

I realize that this article is from an older version of ICM however the switch listed in the article was not present on the HDS server we are having problems with. That being said, I did find this switch on the HDS server which is currently working. Is this switch relevant? If so, do I need to rebuild the HDS database?

OK, looking at the pictures it seems like you took these screenshots at about 3:50pm - the TCD and Event table show these time stamps.

On side A, the half-hour tables look correct on side A - all updated with the 3:00pm records. Side B looks incorrect. The TCD table seems up to date, but the half-hour tables are pretty old.

You can put trace on Replication on Logger B (use 0xff) and trace on Replication on AW/HDS2 (use 0xffff) and watch the trace. You should see that the recovery keys are out of whack. The HDS is asking for data that the logger does not yet have, or something like that.

The normal solution is to truncate the Recovery table on each of the HDS and the Logger after stopping the processes. This will allow replication to proceed correctly. No data is lost - it's all in the Loggers and they are sync'd - you just need to get the data over.

There are two approaches - the first may work, and the second may not be needed.

1. On AW/HDS2, stop the Distributor

2. From SQL Query Analyzer, issue "truncate table Recovery" against the xx_hds database

3. Start the Distributor

4. Watch the trace of the Replication process on the loggerB. You will see the requests come over. On the Distributor in the Replication trace you will see trace about OPERATIONS (inserts etc).

Use ICMDBA and look at those timestamps - hit refresh each time and you should see the tables come up to date.

If the trace shows this is not working, you need to do both distributor and logger. It's very important that you truncate the table AFTER stopping the process, or it will not work.

1. On AW/HDS2, stop the Distributor

2. On Rogger B, stop the Logger B

3. From SQL Query Analyzer, issue "truncate table Recovery" on logger B (against the xxx_sideB database)

4. On AW/HDS2, issue the same command against the xx_hds database

5. Start the logger

6. Start the distributor

7. Watch the trace of the Replication process on the loggerB and the trace of the Replication process on the Distributor. You will see all the operations occurring

Again, use ICMDBA to watch as the HDS comes back into line.

Regards,

Geoff

Oh, you should probably do this through a TAC case. Just to be sure.

Regards,

Geoff

Thank you very much for the fast response again. You have been very helpful. I'm not sure if you saw the article I sent but it was regarding the following registry key and its value:

KEY: HKEY_LOCAL_MACHINE\SOFTWARE\GeoTel\ICR\\Distributor\NodeManager\CurrentVersion\Processes\rpl\ImageArgs

Value:

/db /client/name /replicationport40026/recoveryport40028/all

The HDS I was having problems with was missing the /all switch at the end. I added the switch and then followed your process. I am collecting Half Hour data going forward however I am still having problems because it is trying to find a matching Recovery Point and can't. I believe this is because the /all switch needs to be present when the ICM services are initially started. Which means my last choice is to blow away the HDS database and then rebuild it with the /all switch already in place. If you have a second, let me know your thoughts.

Not sure why the /all switch is missing. I just checked an HDS here, and I certainly have it. Did your HDS install work perfectly? Is this a new setup - 7.2(5) so it's a modern code base.

It was my understanding that if you truncated the Recovery table, the Replication peer on the Distributor would not specify a recovery point - it passes over -1. The logger gives it everything it thinks it should, and the HDS sorts out what it needs to insert. If the Recovery table is truncated on both Logger and Distributor, I'm not quite sure how it knows what to add.

I did tackle a similar problem on 7.2(4) recently, so your post is interesting.

Regards,

Geoff

I'm with you...I have no idea why the /all switch would be missing. The /all switch was present on the Side A AW/HDS which is working as expected. This is a brand new 7.2.5 installation which just went live yesterday. The HDS installation was clean which makes this whole thing even more confusing. It looks like we are capturing data going forward however the Replication process on the Logger keeps showing messages that state, "No MATCHING Recovery Request for table ..." The message is constantly flashing in the window. Its like the system can't figure out how to get itself caught up. Which is why I'm thinking my only option is to delete the HDS database and recreate it. From there, my hope would be that the AW/HDS and Logger can sync up a Recovery Key and get all caught up.

I just thought I'd bump this to the top to see if you are watching the forum and have anything to add.

Regards,

Geoff

Thanks for following up Geoff. I did try and delete the database and recreate it however it did not help the problem. If I look at the Replication process on the Logger I am getting the following error message: The Historical records may have been deleted in the Server database. I have never seen anything like this before. One thing that did come up was the client made multiple changes to their WAN and I have now lost the Private connection between Side A and Side B. I need the client to resolve this issue before I can continue to troubleshoot so unfortunately I am currently in a holding pattern. Thanks again for checking up on this Geoff. You have been extremely helpful and I really appreciate all of your help. Will keep you posted on this issue going forward. Thanks again.

Hey John, it makes a bit more sense now that you say "now lost the Private connection between Side A and Side B" because, like you, I have never seen that scary message before. ;-)

Obviously, getting the synchronization all up to snuff is the number 1 thing to solve.

Regards,

Geoff

Hey Geoff, just wanted to let you know that once the client got the Private network all straightened out, everything fell back into place. Thank you so much for your help on this.

Good work mate.

The private network is the biggest problem UCCE customers have - they often will not obey the requirements for a number of reasons.

I've even seen them ask CCIE Routing and Switching Cisco engineers whether they need a totally separate private network, and they don't understand either.

Regards,

Geoff

Having is same problem, the distributor (configlogger) reporting to me that dbnextrow failed... do you have idea where to search it?

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: