11-24-2010 06:49 AM - edited 03-16-2019 02:05 AM
Hello,
I have a Cisco Untiy Connection cluster of two servers, one publisher and one subscriber. They are running software: 7.1.3ES43.33034-43. Yesterday at about 8:43 I lost connection to the subscriber server. I could still ping it but was unable to get to it via GUI nor SSH. I spoke to TAC yesterday and they told me to simply reboot the server. I was a little concerned about this because I don't want to cause some kind of split-brain effect. I work in a hospital and it is imperative that the voicemail system stay up. My only other option would be to do this at 4:00 am on a Sunday. Any suggestions?
Thanks
11-24-2010 06:57 AM
I wouldn't worry too much about causing a Split-Brain condition. In fact, a brief period of split-brain is normal during recovery. In your scenario - I'd go with TAC and reboot the server. Technically, if you have your ports configured correctly for failover and etc then you should be able to reboot the subscriber at any time. Here is some additional tidbits on Split-Brain and what it is:
When the servers in a Cisco Unity Connection cluster have Primary status at the same time (for example, when the servers have lost their connection with each other), both servers handle incoming calls (answer phone calls and take messages), send message notifications, send MWI requests, and accept changes to the administrative interfaces (such as Connection Administration). However, the servers do not replicate the database and message store to each other and do not receive replicated data from each other.
When the connection between the servers is restored, the status of the servers temporarily changes to Split Brain Recovery while the data is replicated between the servers and MWI settings are coordinated. When the recovery process is complete, the publisher server has Primary status and the other server has Secondary status.
Hailey
Please rate helpful posts!
11-24-2010 07:02 AM
Thank you for your fast response.
The reason I bring this up is because I upgrade to this code about a month ago. I first upgrade the publisher and then the subscriber. When the subscriber came back up it had caused a split-brain effect for at least 15-20 mins. At that time when I would try to retrieve my voice messages, Unity Connection told me Voice Messages were unavailable at the time.
Do you recommend I do this during a downtime then?
11-24-2010 07:06 AM
Again, split-brain during recovery is normal - and recovery after an upgrade takes much longer than a simple reboot. It takes approximately 15-20 minutes for all of your Tomcat and other web services to start before a server is recognized...so this could be the lag you saw. During it during a downtime is never a bad idea but if you are experiencing a bottleneck with VM then I'd do it sooner rather than later. That is totally your call. If you are looking for what I believe to be the most stable version of 7.1 code then it is 7.1.3.32900-4 or 7.1(3b)SU2. Tried and tested.
Why don't you just do the reboot the afternoon? I'm sure you'll have some downtime as folks part for the holidays.
Hailey
Please rate helpful posts!
06-03-2014 04:32 AM
Hi David and all,
we have a unity connection 9.x pair. after an IP address change on the subscriber we are getting the error message:
‘Communication is not functioning correctly between the servers in the Cisco Unity Connection cluster’
Both servers respectively show its own status as a publisher but lost communication with the other server (if we check cluster management from either server).
show tech network hosts – displays correct info.
show perf query class "Number of Replicates Created and State of Replication"
==>query class :
- Perf class (Number of Replicates Created and State of Replication) has instances and values:
ReplicateCount -> Number of Replicates Created = 603
ReplicateCount -> Replicate_State = 2
On both servers the replication is showing as 2 – so its good.
Various reboots have been done, services are up and working and there are no firewalls between server’s .
Any advice appreciated.
TIA…Jeff
06-03-2014 08:27 AM
Can you post the output of the "show cuc cluster status" from the CLI of each server?
06-05-2014 01:27 AM
David, thanks for assistance.
from the Pub we get:
ACE_File_Lock::ACE_File_Lock: Permission denied /dev/shm/CCM_GENstatusLock_0
Server Name Member ID Server State Internal State Reason
----------- --------- ------------ ----------------------- -------
dvbcucaw01 0 Primary Pri Active Disconnected Normal
dvbcucbw01 1 Disconnected Unknown Unknown
SERVER ID STATE STATUS QUEUE CONNECTION CHANGED
-----------------------------------------------------------------------
g_ciscounity_pub 100 Active Local 0
g_ciscounity_sub1 101 Active Connecting 40188370 Jun 5 09:06:29
From the sub we get:
Server Name Member ID Server State Internal State Reason
----------- --------- ------------ ---------------------------- -------
dvbcucaw01 0 Disconnected Unknown Unknown
dvbcucbw01 1 Primary Sec Act Primary Disconnected Normal
Database replication is not active
06-16-2014 12:50 PM
Cisco TAC with root level access edited one of the hosts files to correct the IP address.
06-03-2014 12:45 PM
You should probably open a new thread for this issue.
06-03-2014 12:55 PM
There is actually a duplicate post in another topic...so just keep that one thread going. Same question / response going there.
11-24-2010 07:18 AM
Hi latintrpt,
I will add my +5 point vote for this good info from Hailey
Just to let you know we have had to re-boot our Sub during office hours
on two occasions. This was done without any noticeable effect for our users.
Just make sure the Pub is set as primary and that the Sub has been set
to "Stop taking calls" (give some time for calls to clear) before rebooting.
On another related note, we had a time when our Sub was out of commission completely
for a few days due to Tomcat Service Bug. Again, our users saw no issues during this time.
The CUC Cluster design has been well thought out by Cisco and is indeed very resilient!
Cheers!
Rob
11-24-2010 08:20 AM
Thanks for the information guys.
Rob, I am unable to to set the Sub to "Stop Taking Calls" as it says "Not Available" Server status for the subscriber is "Not Reachable".
The Publisher does say "Primary"
11-24-2010 08:32 AM
If the Publisher is primary and you are not experiencing call issues, you simply need to reboot the Subscriber - either now or later.
Another way to force the Sub to not take calls is to remove it's ports from the CUCM hunt list/line group configurations.
Hailey
06-16-2014 01:44 PM
+5 Rob and David! You guys always have good info.
Latin,
When you verified the Subs "Port Manager" status, did you log in through the Pub or the Sub? If you logged in through the Pub, try logging in through the Sub. So you can verify whether it's "Taking Calls" or also stating "Not Available".
Regards,
Yosh
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: