cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1493
Views
5
Helpful
5
Replies

CWMS Disaster Recovery -> Restore of single Node?

dirk.fidorski
Level 1
Level 1

Hi all,

 

I am looking into Backup/Restore Topics, which are quite clear, but I am lacking the following part of the puzzle:

Let's assume I run a 2k User HA Deplyoment on 2.0MR4, therefore consisting of 11 single VMs. My primary IRP fails unrecoverable.

 

How do I restore this one single VM?

 

As far as I understand I can redeploy with my 2.0 OVA, but do not see an Option to get this node updated to MR4 and rejoined to the cluster. Or is there some Magic?

 

(I know I could rebuild the "whole" Cluster using the NFS-Backup or VDP Full VM Backups, but this seems to me a bit like an "overkill" if actually only 1 single VM out of 11 has failed)

 

TIA

Dirk

1 Accepted Solution

Accepted Solutions

dpetrovi
Cisco Employee
Cisco Employee

Hello Dirk,

 

CWMS product doesn't have the option to recover a single VM in case of a major failure. However, here are some of the scenarios and what is possible to do (will stick to 2000 user system with IRP and HA).

Please, take a look at the Troubleshooting document that covers several scenarios:

Virtual Machine Fails and Cannot Be Recovered
Problem    One of your virtual machines fails and you are unable to fix it even with the assistance of the Cisco TAC.

Possible Cause    There are several possible causes including the following: you have a corrupt database, you have a faulty configuration, unsupported maintenance activity, power failures, hardware failures, and more.


Solution    If a virtual machine on your high-availability configuration fails, remove the high-availability virtual machines from your system. Redeploy all of your high-availability virtual machines and then reconfigure the system for high availability. Refer to "Configuring a High Availability System" in the Cisco WebEx Meetings Server Administration Guide for more information.

Similarly if an Internet Reverse Proxy virtual machine fails, you must remove that virtual machine from your system. Then redeploy and reconfigure your Internet Reverse Proxy virtual machine. Refer to "Adding Public Access" in the Cisco WebEx Meetings Server Administration Guide for more information.

For any other virtual machine, your must rebuild your system using the Disaster Recovery feature. Refer to "Using the Disaster Recovery Feature" in the Cisco WebEx Meetings Server Administration Guide for more information."

 

I hope this helps.

 

-Dejan

View solution in original post

5 Replies 5

dpetrovi
Cisco Employee
Cisco Employee

Hello Dirk,

 

CWMS product doesn't have the option to recover a single VM in case of a major failure. However, here are some of the scenarios and what is possible to do (will stick to 2000 user system with IRP and HA).

Please, take a look at the Troubleshooting document that covers several scenarios:

Virtual Machine Fails and Cannot Be Recovered
Problem    One of your virtual machines fails and you are unable to fix it even with the assistance of the Cisco TAC.

Possible Cause    There are several possible causes including the following: you have a corrupt database, you have a faulty configuration, unsupported maintenance activity, power failures, hardware failures, and more.


Solution    If a virtual machine on your high-availability configuration fails, remove the high-availability virtual machines from your system. Redeploy all of your high-availability virtual machines and then reconfigure the system for high availability. Refer to "Configuring a High Availability System" in the Cisco WebEx Meetings Server Administration Guide for more information.

Similarly if an Internet Reverse Proxy virtual machine fails, you must remove that virtual machine from your system. Then redeploy and reconfigure your Internet Reverse Proxy virtual machine. Refer to "Adding Public Access" in the Cisco WebEx Meetings Server Administration Guide for more information.

For any other virtual machine, your must rebuild your system using the Disaster Recovery feature. Refer to "Using the Disaster Recovery Feature" in the Cisco WebEx Meetings Server Administration Guide for more information."

 

I hope this helps.

 

-Dejan

Thanks a lot Dejan,

 

...well yes, I understand the problem is not as impacting if the failed node is "luckily" one of the HA VMs or an IRP, as this would allow us to detach HA VMs or IRPs and attach the rebuilt VMs. Still for any Primary Adm/Med/Web -> Full Redeployment.

 

So as the restore of a NFS Backup besides full redeployment does require some manual steps like reconfiguring CUCM Teleconferencing config, reconfiguring SSO, reconfiguring SNMP, Certificates, etc,... the only smooth way to "quickly" recover would be a full VMWare DP VM Backup... (Or a fully redundant DR-System)

 

Is there anything planned to change this behavior? Being able to redeploy/replace a single node getting them updated and integrated cleanly provisioned back in the cluster? (As I guess the Admin VMs should be able to "provision" other nodes, ... so unless both ADMs fail ...)

 

Thanks

Dirk

Hi Dirk,

 

So far, I am not aware of any plans to change this as it requires some major design changes. However, in 2.5 version, we are introducing a dual-datacenter option for high availability, so at least if something in one DC fails, you will be able to continue running the other DC until the first one is rebuilt. But still, you won't be able to recover just a single VM. 

 

-Dejan

Hello Dejan,

There are two questions,

1- How does IRP DB backup with NFS server,

2- If only IRP server is inaccessible (CWMS Admin server is up), how does it recreate IRP server (There is no HA, it consists of a CWMS Admin, an IRP server for 50 Users.), Is it possible to use NFS backup?

Thanks a lot,

 

Hello Emine,

1. IRP server doesn't contain any database. Database is located on Admin VM. NFS backup is used only if you need to rebuild Admin VM so that you can restore all the configuration and database.

2. If you experience an issue with IRP server that can't be resolved, what you would need to do is:

a. Remove Public Access - which would detach your faulty IRP server from CWMS solution

b. Delete faulty IRP VM

c. Use the base version OVA and manually deploy IRP VM (e.g. if your system is on 2.5 MR6, use 2.5.1.29 OVA to deploy IRP VM) 

d. Add Public Access - which will attach your new 2.5.1.29 IRP VM to CWMS solution and automatically update it to 2.5 MR6 to match the version of the entire solution.

Keep in mind, this process will help only if the original IRP VM is failing due to IRP issues. If you have some issue with networking, firewall configuration that might be causing connectivity issues with internal VMs or DNS server, then rebuild of the IRP won't be of help.

Do make sure you identify what caused IRP VM to be unresponsive. 

I hope this will be of help. 

-Dejan