How to restore unity connection publisher?

Answered Question
Mar 15th, 2010

I found a typo in the name of the publisher so I changed it in the OS Administration.  After the server rebooted I can no longer log into the Administration of the unit.  I get database error messages when I try to log in.  The subscriber is operating as primary right now.

So I guess I need to reload the machine completely.  My question is once I start the install process and bring this machine back on line, will it take all of the database information from the subscriber or is there something I need to do during the install to make that happen.  I really do not want to recreate everything that is there even though these units have not gone into production yet.

There are also no backups done since it is not in production.   I know that during the install I get asked if this is the first server in the cluster, do I answer no even though eventually I want this server to be the publisher again?

Thanx, any help is greatly appreciated.

Seth

I have this problem too.
0 votes
Correct Answer by David Hailey about 6 years 8 months ago

Well, if you want to go the reload route then this is the way to do it:

1) Shutdown the Subscriber.

2) Rebuild the Publisher via DVD which will overwrite the hard drive. The DVD comes with the server. Do not try to use the "Recovery" CD, just the straight up application CD - CUCM version (whatever you had shipped).

3) Before you install the Subscriber, make sure you verify NTP sync on the Pub (critical).

4) Once you have the Publisher rebuilt, add the Subscriber in CU Admin as part of the cluster just as you normally would. Use the IP address of the server (recommended).

5) Rebuild the Subscriber from DVD as you did the Publisher. It should be added as the second node in the cluster so make sure you have DNS and IP connectivity to/from all resources needed BEFORE STEP 2 ever occurs.

6) Once both servers are up and running, check CUC Serviceability and look at the cluster status.

7) Then, go to the CLI on both servers and run "show tech network hosts". The /etc/hosts file should have a loopback address and both cluster servers included in the file.

8) Make a test user on the Pub, verify it replicates to the Sub. Delete it from the Sub, make sure it deletes from the Pub.

9) Test failover then test failback by going to CUC Serviceability and putting the Publisher back as active.

10) You can also look at the cluster status on both servers from the CLI - show cuc cluster status is the command to do that.

11) If all that checks out, get your license files loaded and do your configurations.

12) Make sure you set up DRS backups on a schedule (for both servers) and perform an initial manual DRS backup of both as well.

13) I would also recommend that you upgrade both systems to the latest SU for 7.1.3 which is 7.1(3b)SU2. You can download it from CCO.

As for MWI, there is a specific service that enables MWI to work properly. If that service didn't start on the Sub for whatever reason when it failed over, then that may be the cause. Get the servers rebuilt and let's go from there. Sound good?

Hailey

Please rate helpful posts!

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (29 ratings)
Loading.
David Hailey Mon, 03/15/2010 - 15:29

Ok - you can't log into the "Administration" so I'm assuming you mean the CU Admin interface.  Can you log into the CLI on the box?  If so, you may be able to mend what's broken.  Let me know if you've tried getting into the CLI or not.

David Hailey Mon, 03/15/2010 - 15:59

SSH into the CLI.  You typically do this in the reverse order by changing host name via CLI and then in OS administration.

First take a look at "show network cluster" and see if the hostname on the CLI matches the new hostname that you set in OS admin.  If it does not, then you'll need to look at the "set" commands.  You can change the hostname via CLI using "set hostname cluster publisher " where matches what you set in OS admin.

If, by chance the name in the CLI does match up then I have to ask if you rebooted the publisher after changing the hostname?  And, if so - what does the status of DB replication look like for the 2 cluster nodes in RTMT (or via command line)?  You'll need to look at the Replicate State to gather that info.

Hailey

Please rate helpful posts!

srosenthal Mon, 03/15/2010 - 17:57

The name is correct in the cli - here is the output

admin:show network cluster
172.30.56.203 tu-voip-unit1.henrico.lib.va.us tu-voip-unit1 Publisher
172.30.56.204 tu-voip-unity2.henrico.lib.va.us tu-voip-unity2 Subscriber

RTMT is not connecting to the publisher.  Here is what is on the web page for the subscriber

error  Communication is not functioning correctly between the  servers in the Cisco Unity Connection cluster. To review server  status for the cluster, go to the Tools > Cluster Management page of  Cisco  Unity Connection Serviceability.         warning  The Cisco Unity Connection cluster subscriber server  has changed to Primary Status (failover has occurred). To review  server status for the cluster, go to the Tools > Cluster Management  page of Cisco  Unity Connection Serviceability.

I do not know the cli command to view the replication status.

Also, the server automatically rebooted when I changed the name and then rebooted again when I tried to change it back.

Seth

David Hailey Mon, 03/15/2010 - 18:22

OK.  So, I have some CLI info to provide but first I wanted to inquire about the following:

I noticed that the Subscriber is tu-voip-unity2; however, the Publisher is tu-voip-unit1 (notice the missing "y" in the Publisher host name).  Maybe this was intentional - I can't say as I don't know your environment.  But, if you are relying on DNS for communication and you misnamed the host on the CUC Publisher then this would be a cause for problems.

On to the CLI:

I know that on CUCM that DB replication is stopped after you change the hostname of the Publisher.  So, assuming this is still a non-production system (I think you indicated that in your first post) here's some things I'd look at.

On the subscriber, run "show tech network hosts" from CLI and see what it's host table looks like.  You should also be able to set the new hostname of the Publisher via the CLI as well.  On the Subscriber, you'd need to run "set network cluster publisher hostname " where hostname is the correct hostname of the Publisher.

To check replication, you need to run the following command on both  servers:

show perf query class "Number of Replicates  Created and State of Replication"

You will see a value  on both servers of 0 - 4.  A value of 2 indicates replication is good.  0  is not started, 1 indicates an issue with replication counts, 3 means  replication is bad, and 4 means replication did not succeed.

Replication is likely to be hosed up (if functioning at all).  However, if you can get the 2 servers to be cognizant of each other again then you could attempt to reset replication from the Publisher CLI using "utils dbreplication reset all".

Hailey

Please rate helpful posts!

srosenthal Mon, 03/15/2010 - 18:45

Well the naming is the root of my problem.  It was tu-voip-unit1 and I changed it to tu-voip-unity1.  Then after the reboot it came back up with licensing errors so I thought I could just change it back to tu-voip-unit1 and that is where I am at now.

I would like it to be tu-voip-unity1 if possible as that is what I was asked to name it.

Here is the output from show tech network hosts

127.0.0.1 localhost
::1 localhost
172.30.56.203 TU-VOIP-UNIT1.henrico.lib.va.us TU-VOIP-UNIT1
172.30.56.204 TU-VOIP-UNITY2.henrico.lib.va.us TU-VOIP-UNITY2

I changed the name on the pub to tu-voip-unity1.  Does case matter?

I also changed to name on the subscriber to match that using the command you listed.

Here is the output of the command you gave.

admin:show perf query class "Number of Replicates Created and State of Replication"
==>query class :

- Perf class (Number of Replicates Created and State of Replication) has instances and values:
    ReplicateCount  -> Number of Replicates Created   = 427
    ReplicateCount  -> Replicate_State                = 2

The pub is rebooting so I will update when its done and was so when I got the above output.

Seth

David Hailey Mon, 03/15/2010 - 18:49

Case doesn't matter.  Let me know what happens when the cluster reboots.

srosenthal Mon, 03/15/2010 - 19:14

Ok, the pub is back up and I can web into it, a great improvement from before.

Here is what it says after login.

warning  The Cisco Unity Connection  cluster subscriber server has changed to Primary Status (failover has  occurred). To review server status for the cluster, go to the Tools >  Cluster Management page of Cisco Unity Connection Serviceability.

So I guess the question is what next and how do I make the pub primary again?

Dude, thank you so much for all the help.  I owe you a cold one!

Here is the output from the pub -

admin:show perf query class "Number of Replicates Created and State of Replication"
==>query class :

- Perf class (Number of Replicates Created and State of Replication) has instances and values:
    ReplicateCount  -> Number of Replicates Created   = 0
    ReplicateCount  -> Replicate_State                = 0

Here is from the subscriber -

admin:show perf query class "Number of Replicates Created and State of Replication"
==>query class :

- Perf class (Number of Replicates Created and State of Replication) has instances and values:
    ReplicateCount  -> Number of Replicates Created   = 427
    ReplicateCount  -> Replicate_State                = 2

Seth

David Hailey Mon, 03/15/2010 - 19:26

No problem, brother.  By the name of the servers, I don't think it would be impossible to buy me a cold one at some point...not too far away.  So, go to Cisco Unity Connection Serviceability > Tools > Cluster Management and you'll see controls on how to change the server status between the cluster servers.  BEFORE you do that, you should log in to the CLI of BOTH servers and run the DB replication command I sent you before.  Make sure the replicate state is 2 on both servers.  If so, swap primary status and then do some shakeout tests (add a user on one server, make sure it replicates to the other then delete it from the second server and make sure it's deleted from the first, etc).  You know the drill from there.

Let me know what comes of it.

Hailey

Please rate helpful posts!

srosenthal Mon, 03/15/2010 - 19:44

Ok,

I did the command on both and the subscriber told me that it had to be done from the pub.

Here is the pub output.

admin:utils dbreplication reset all
This command will try to start Replication reset and will return in 1-2 minutes.
Background repair of replication will continue after that for 1 hour.
Please watch RTMT replication state. It should go from 0 to 2. When all subs
have an RTMT Replicate State of 2, replication is complete.
If Sub replication state becomes 4 or 1, there is an error in replication setup.
Monitor the RTMT counters on all subs to determine when replication is complete.
Error details if found will be listed below
OK [172.30.56.204]
admin:
admin:show perf query class "Number of Replicates Created and State of Replication"
==>query class  :

- Perf class (Number of Replicates Created and State of Replication) has instances and values:
    ReplicateCount  -> Number of Replicates Created   = 0
    ReplicateCount  -> Replicate_State                = 0

I will let it adjust over night and check the show perf command tomorrow.  As per your instructions I will wait until it shows 2 before making the pub primary.

Seth

David Hailey Mon, 03/15/2010 - 19:50

Ah, I actually was referring to the command to check the DB replication state. You are resetting the cluster replication which is why it has to be done from the pub...but that's OK, it's probably not a bad idea after all the name changes and etc. Typically, you should do that after resetting the Publisher's host name anyway (at least with CUCM, that's the case). But yeah, let it run and see that everything returns OK. If not, then we can probably deal with that too.

Hailey

Please rate all helpful posts!

srosenthal Tue, 03/16/2010 - 04:21

Ok, the pub still shows 0 for the replicate state.

I am not sure which other command you are talking about as I looked back and did not see it.  Could be a combination of either too late at night or too early this morning.

I eagerly await your next command!

David Hailey Tue, 03/16/2010 - 07:18

OK.  So, given the scenario here is what I'd do:

Manually fail back to the Publisher to make it primary.  Then use the following procedures in the attached document (Page 2-2 - Manually Change a Server from Secondary to Primary Status) to check that everything there is taken care of.

From there, I'd run the following command on each server to see if you get the same output for replication status (but opposite of what you saw last night, i.e. - see if you get 2 on the Pub and 0 on the Sub.  If so, this may be expected due to the HA setup of the cluster and how the SRM behaves when failover occurs).  The command to check DB replication status is: show perf query class "Number of Replicates Created and State of  Replication".  Run it on both servers.

If switching back to the primary doesn't initiate replication on that node then I would assume there is still an underlying problem.  The replication status of that node should be 2.  There are 2 options as I see it from here.  You could attempt to restart the replication like you did last night (worth a shot) OR you can fail back to the Subscriber and go the TAC case route.  If the Publisher has been hosed up, you can't just build a new server and it tell it the Subscriber is the Publisher.  You will have to rebuild both nodes from scratch.  Luckily for you, these aren't in production yet so you have that opportunity to do so without impacting anyone if it absolutely needs to be done.

Now, let's assume the Publisher is good and replication status is 2 but Subscriber is initially status of 0.  You'll need to go the CU Admin on the Publisher and create a new user w/mailbox.  Then login to the CU Admin of the Subscriber and verify that the new user shows up there as well.  While you're in the Subscriber, delete that new user and then verify that it is removed from the CU Admin on the Pub.  You should run RTMT the whole time.  Use the Perf counters on RTMT to look at the Replicate State and Replicates Created counter visually during this test.  When you create the new user w/mailbox on the Publisher, if the cluster is behaving normally then the status of replication for BOTH nodes should be 2.  In other words, the Subscriber should go from 0 to 2.  If it does not, then you are back to where you should probably go the TAC case route.  While it would really suck to have to rebuild both nodes from scratch, you'd need someone from TAC to verify what is wrong within the cluster and if it can be fixed with the servers as-built.  If it cannot, I'd assume your obligation to the client is to deliver a healthy, working system and the only way to be positive of that after these issues may be to rebuild from scratch.

If what I've told you helps you out and you think the cluster is functioning normally again, you need to test the hell out of it.  I'd test manual failovers and failbacks, introduced failovers (i.e., shutdown the Primary), test calls in every scenario, and monitor via RTMT as you go.

Let me know what's up.

Hailey

Please rate helpful posts!

srosenthal Tue, 03/16/2010 - 17:26

Hailey,

You said - If so, this may be expected due to the HA setup of the cluster and how  the SRM behaves when failover occurs).

I am unfamiliar with HA but this came up in discussion with a co-worker today.  I think that my licensing is so that only one server is active at a time and not load shared between both.  He said something about that my ports should show up as split half on one server and half on the other.  Is this correct?

Seth

David Hailey Tue, 03/16/2010 - 17:49

Well, with Unity Connection clusters - at least in 7.1, they are set up to provide HA by having 2 servers configured in a cluster.  The servers are Active-Active in that if one fails, the other can automatically take over.  The SRM (Server Role Manager) is the primary piece of how that works.  How licensing typically works is that with clustering you need a license for each server.  You install one license on the Pub (with MAC address of it's Eth0 interface) and a license on the Sub (with MAC address of it's Eth0 interface).  As far as ports go, the Unity Connection clusters are intended to provide 100% failover (i.e., overprovisioning is technically possible in some scenarios but ideally if you need 72 ports for all your messaging traffic then you should provision 72 ports on each server).  So, the servers share the same phone system integration but each server has it's own set of ports that will register with CUCM.  From there, the CUCM line group configurations control how you distribute calls between the servers.  Ideally, you treat a CUC Cluster like a CUCM cluster in that the Subscriber should handle the majority of all the calls.  If there is burst past it's capacity, you can roll calls to the Publisher via the line group configs.  You also treat the Publisher like a Publisher in that it should be the primary server for handling web traffic (e.g., CU Admin, Cisco PCA, IMAP, etc).

With all that said, there is also a concept of maintaining a 7x warm-standby model.  Basically, you install a spare server somewhere, maintain the DRS backups of the active server somewhere that they can be accessed in the event of emergency/necessity, and then you restore that server with the DRS backup in an emergency.  You can actually maintain this type of setup for a single Connection server or a Connection cluster.  Licensing in this model is a little different.  1) You can purchase a dedicated license 2) You can use the license from the server that has failed and is in the DRS backup.  The MAC addresses won't match up so you have to reboot the standby server every 24 hours because the server is in violation of the license. 3) You can get a replacement license for the server that failed but assign it to the standby server (i.e., with MAC address of the standby's Eth0 interface).

From what you've described, I've assumed you set up a standard Unity Connection cluster.  When you first changed the host name of the Publisher, the Subscriber took over active status, right?  If that's the case, you're doing a normal Unity Connection cluster and what I said in the first paragraph applies.  If you want to send me a private message, I can look at your license files for you if you would like but that's up to you.  I don't know that I really need to based on what I think you're working on.

Hailey

Please rate helpful posts!

srosenthal Tue, 03/16/2010 - 18:11

I was looking over the quote and do see UNITYCN7-HA-24 listed so I  must have missed that somewhere and did not enter the PAK on Cisco to  get the license file.  I will look for it but it might not be until  Thursday.

I created a user on the subscriber and it replicated to the publisher right away.  I ran the show perf query again on the pub and it still says "0".

I then deleted the user from the pub and it was gone from the sub.

I thought all license files get installed on the pub and would be automatically applied to the subscriber like the call manager does.

Btw, what brought all this on was I was trying to troubleshoot mwi not working and not being able to call a phone to record prompts.  I can of course force mwi on and off by dialing the numbers from the phone.

Just thought I would throw that in.

Seth

David Hailey Tue, 03/16/2010 - 18:23

So you have HA for up to 24 ports, that's what that product code is. There is a command to attempt to repair replication on a node; however, I'm not sure how much good that will do currently. So, let's proceed as follows:

A few questions for you to answer:

1) Do you have RTMT loaded and do you know how to look at the replication state in RTMT? If not, I can tell you.

2) Which server is active in the cluster? Go to CU Serviceability > Tools > Cluster Mgmt and it will tell you.

3) For clusters, this is how licensing works: When a Cisco Unity Connection cluster (high availability) is configured, two licenses are required. The license that has the MAC address of the publisher server must be installed on the publisher server. The license that has the MAC address of the subscriber server must be installed on the subscriber server. So, there is likely an issue here that you need to straighten out with TAC regarding the licensing on the cluster. I would do that first and foremost if you can.

4) Let's get the cluster working properly and then I can help you with MWI. Is this a SCCP or SIP integration with CUCM (I recommend SCCP if you're not already using it).

5) Are there any other PBX's involved here where you're using TIMG or PIMG for integration?

Hailey

Please rate helpful posts!

srosenthal Tue, 03/16/2010 - 19:45

I do have RTMT loaded but could not find where to look at the replication state.

Under CU Serviceabiltiy - Tools - there is no Cluster Mgmt option.

I think the connection to CUCM is sccp, don't remember choosing SIP when we set the ports up.  No other PBX's, just the call manager.

I do notice in RTMT I am getting a lot of  SyslogSeverityMatch messages.

I will be out on the road again tomorrow most of the day.  Thanx again for all the help.  I guess I will have to open a TAC case on Thursday morning.

David Hailey Tue, 03/16/2010 - 19:54

Ouch. No cluster management option is a bad indicator. I hate to say

that. There are still some things I could try to help you with but

you should get the TAC case open for sure. Can you still web into the

Publisher? If so, under Cluster do you have the Subscriber defined as

a second node? You wouldn't have been able to install the second node

without it but check again. In the interim, hit me up when you get

back to working on this. In the end, even if you have to rebuild from

scratch I can guide you with the right procedures to ensure the 2nd

time is a charm. You're welcome for the help...that's what the forums

are for anyways.

Hailey

Please rate helpful posts!

Sent from my iPhone

On Mar 16, 2010, at 10:46 PM, srosenthal

srosenthal Wed, 03/17/2010 - 16:52

Ok, my bad, I was looking under Unified Serviceability and not Unity Connection Serviceability.

So I tried to make the pub primary and it did not work.  It looked like it was making the switch but in the end the subscriber took over as primary again.

I do have full access into both boxes, via web or cli.

I really need to have these boxes stable by the end of the week so I am seriously thinking of just going ahead and re-installing the OS on both units, making certain I get the spelling right.  They are going to need to go live on Monday.

My guess is then all I will have to takle is the MWI on/off issue and I am trying to locate the license for the HA as it was ordered but I have not received it yet.

Any tips on doing the reload would be much appreciated.

Thanx, Seth

Correct Answer
David Hailey Wed, 03/17/2010 - 17:17

Well, if you want to go the reload route then this is the way to do it:

1) Shutdown the Subscriber.

2) Rebuild the Publisher via DVD which will overwrite the hard drive. The DVD comes with the server. Do not try to use the "Recovery" CD, just the straight up application CD - CUCM version (whatever you had shipped).

3) Before you install the Subscriber, make sure you verify NTP sync on the Pub (critical).

4) Once you have the Publisher rebuilt, add the Subscriber in CU Admin as part of the cluster just as you normally would. Use the IP address of the server (recommended).

5) Rebuild the Subscriber from DVD as you did the Publisher. It should be added as the second node in the cluster so make sure you have DNS and IP connectivity to/from all resources needed BEFORE STEP 2 ever occurs.

6) Once both servers are up and running, check CUC Serviceability and look at the cluster status.

7) Then, go to the CLI on both servers and run "show tech network hosts". The /etc/hosts file should have a loopback address and both cluster servers included in the file.

8) Make a test user on the Pub, verify it replicates to the Sub. Delete it from the Sub, make sure it deletes from the Pub.

9) Test failover then test failback by going to CUC Serviceability and putting the Publisher back as active.

10) You can also look at the cluster status on both servers from the CLI - show cuc cluster status is the command to do that.

11) If all that checks out, get your license files loaded and do your configurations.

12) Make sure you set up DRS backups on a schedule (for both servers) and perform an initial manual DRS backup of both as well.

13) I would also recommend that you upgrade both systems to the latest SU for 7.1.3 which is 7.1(3b)SU2. You can download it from CCO.

As for MWI, there is a specific service that enables MWI to work properly. If that service didn't start on the Sub for whatever reason when it failed over, then that may be the cause. Get the servers rebuilt and let's go from there. Sound good?

Hailey

Please rate helpful posts!

srosenthal Thu, 03/18/2010 - 14:36

Oh happy day!  It looks like third time is a charm!  I reloaded both machines and all is working, even the MWI.

Hailey, thank you so much for all the help!

I am in Richmond, VA, if you are close I still want to buy you that cold one.

One last question as I am too tired to look it up right now.  What is the web address for the end user to log on to unity to their personal mailbox?  For example I know of CM it is www.callmanager/ccmuser, what about unity connections?

Seth

David Hailey Thu, 03/18/2010 - 14:52

That would be Personal Communications Assistant - PCA: https:///ciscopca

No problem, man. Glad I could help. I'm in DC but have friends in Richmond and come down often. I'll hit you up next time I'm down and we can grab a cold one (or 10)...but you don’t have to buy. Like I said, just glad you got it worked out.

Hailey

Please rate helpful posts!

srosenthal Mon, 03/22/2010 - 15:08

Ok, here is a question regarding the HA license.

I did install it earlier, did not realize thats what it was, but I think I did it incorrectly.  I registered the PAK to the Publisher and installed the license there?  The screen show right amount of mailboxs but shows twice the number of ports with half being used by the publisher and none being used by the subscriber.

Should I have registered the PAK with the subscriber?

Seth

David Hailey Mon, 03/22/2010 - 15:13

You should definitely have a separate license on the Subscriber. What does it show?

srosenthal Mon, 03/22/2010 - 15:43

The subscriber only shows the demo license.

I guess I need to call Cisco to get them to re-issue the license with the subscibers mac address.

Seth

David Hailey Mon, 03/22/2010 - 15:55

You need a license on each server homed to each individual NIC (eth0

only).

Sent from my iPhone

On Mar 22, 2010, at 6:44 PM, srosenthal

rob.huffman Tue, 03/16/2010 - 11:28

Hi Seth,

Sorry to hear your having problems with this, but at least you've got Hailey

with lots of great tips (+5 Hailey!)

I thought you might find these docs helpful;

Replacing Cisco Unity Connection 7.x Servers

http://www.cisco.com/en/US/docs/voice_ip_comm/connection/7x/upgrade/guide/7xcucrug040.html#wp1052015

Renaming Cisco Unity Connection 7.x Servers

http://www.cisco.com/en/US/docs/voice_ip_comm/connection/7x/upgrade/guide/7xcucrug060.html#wp1053641

Cheers!

Huff

rob.huffman Tue, 03/23/2010 - 05:57

Hey Seth,

Cisco Licensing is very,very good about re-homing these licenses when required. Just send an email to;

[email protected]

Something like this;

We recently aquired a license tied to PAK xxxxxxxxxxx for our CUC 7.x H/A cluster. We have a License tied to MAC CCCDDDAAABBB and now require the license to be re-homed to our Subscriber MAC AAABBBCCCDDD accordingly. We mistakenly tied the license to our Publisher MAC CCCDDDAAABBB

Here is the pertinent info;

PAK - 1122A1D1201

End User Name - Seth Rosenthal

Legal Address

City,Country

Purchase Order Number - 111111

Sales Order Number - 2222333

Bill to ID - xxxxx

Ship to ID - xxxxxxxxx

Thanks in advance for all your help with this! Can you please email the new license to:

[email protected]


Licensing Requirement in a Cisco Unity Connection Failover Environment

http://www.cisco.com/en/US/products/ps6509/products_tech_note09186a0080abd852.shtml


Hope this helps!
Rob


Please support CSC Helps Haiti

https://supportforums.cisco.com/docs/DOC-8727

Actions

This Discussion