COBRAS restore to connection 7.02 cluster

Unanswered Question
lindborg Sat, 02/28/2009 - 08:35

So the users are already in the cluster then? If so COBRAS will just update their data where they stand in the existing 7.0 database (it wont delete and re add them - it just updates them with information from the backup so it should be reasonably safe).

If that's not what you mean, let me know.

As ever, always try an import of a single user to start with if you're unsure - the import allows you to select a single object (user in this case) and restore it by itself which is a good test for this type of scenario.

Also note that I posted updated COBRAS utilities for 7.0 yesterday - make sure to use the latest, quite a few fixes went in to the CUC 7.0 import in particular (mostly related to multiple language support but also other items).

Not exactly, I have a customer with two separate unity 4.2 servers 1100 subscribers on one and 3400 on the other. We are using COBRAS to combine the two Unitys onto a active/active connection 7 cluster. I built a new pair of connection 7.0 servers and did a base configuration. Last week I ran COBRAS against the first 4.2 server (1100 subs) and then the import to connection that worked perfect. It ran for a week and then I did a COBRAS export of the other Unity 4.2 server. I then ran the COBRAS import with this backup; however I only imported 1100 of the 3400 subs because they are going into two different message stores. It got all the way to the message restore (step 11) with only a few minor errors it restored about 400 of the mailboxes and then the CPU on the connection box spiked to 70%, COBRAS outputted this error:

2/27/2009 11:56:03 PM: (error) failure sending email message via SMTP for message with subject=Message from an unidentified caller (8602) in SendMessageToServer_Unity.

2/27/2009 11:56:03 PM: (error) failure returned from IMAP library=ChilkatLog:

SendEmail:

DllDate: Jan 29 2008

Username:

Component: ActiveX

Connection closed by server.

Socket is no longer readable

Failed to get SMTP command response...

SMTP_Connect:

Connecting to SMTP server

smtp_host:

smtp_port: 25

smtp_user: NULL

trying-auth-method: NONE

InitialResponse: 220 HM-LNX-UCX01 UnityMailer (ver 1.0); Fri Feb 27 23:59:26 EST 2009

sendingHello: EHLO CO-2K-UNITY01.MHIS.VOICE

helloResponse: 250-HM-LNX-UCX01:8025 Hello

250-SIZE 10000000

250-PIPELINING

250-AUTH LOGIN

250 HELP

Also in the sys log at the same time was "%CUC_CSMGR-UCEVNT-6-EvtMsgAllportsbusy: All Answer Ports Busy detected. Cluster ID: Node ID:HM-LNX-UCX01" This started at the same time but it kept taking calls ok. I ended up rebooting the serving and exited out of the import. After two hours the CPU would not come below %40, we ended up restoring from a DRS backup from earlier in the night before we started. This got it back to normal.

The mail store was not full only about 5GB at this point, the two services consuming the most CPU were unityoninit and unityoninit#11. I was running the new COBRAS update posted yesterday.

So one of my thoughts, was that it was too much on the server to be part of a cluster, doing replication while having COBRAS running against it. Which brought me to my question. Is it supported to run COBRAS against an active/active connection cluster?

lindborg Sun, 03/01/2009 - 09:17

COBRAS is not doing anything special - it should have no issue importing users into an active/active cluster and based on your description that doesn't come into play here - if the users and other objects are being created and replicating properly, we can take that completely off the table as a reason for the SMTP issues.

So far %100 of the escallations about message import for Connection 7 failing have been virus scanning packages and/or other secuirty applications blocking port 25 or a firewall issue. However if it's a firewall blocking port 25 COBRAS import would not have even started the import as it establishes a connection to the SMTP service on the Connection server up front.

The most common thing is a virus package on the Windows box you're running the import from - more often than not it actually allows a message or two to get through and then starts blocking everything else - I have about a dozen logs so far that show just this - Symantec does this for instance. Then every message after that point fails.

In the log output the "Connection closed by server. Socket is no longer readable " tells the story - you either don't have your SMTP connection open for unauthenticated access or some external process is killing port 25 - not likely a COBRAS issue.

The message import is done completely seperately and is not related to the rest of the import. So you can test by selecting a single user that has messages and restoring just messages for that user - this is much faster for testing than importing the user again - both will work but it's a lot simpler to just do messages. Make sure to do more than one (at least 3) to ensure you don't get a false positive with a virus package letting the first couple get off the box.

You can turn on all MTA traces on the Connection side and run a test then gather the logs - if the MTA is blocking access or terminating the connection for some reason, you may see something in the logs. Thus far I have not seen this happen in the field but you never know.

Thanks Jeff, the only thing that seems odd is so many messages got through before it failed probably about 3000 or so. Also that we could not get the CPU back to a normal level till we did a restore on the box. Along with the error about all the ports being busy when they were not. All the users that were part of this most recent COBRAS import are not on the connection server anymore so restoring just one mailbox isn't an option. Since the first import of 1100 users/mailboxes worked a week ago, it is a production server now, so I want to minimize the risk to the customer. I have a spare server so I think I'm going to get a 30 day lic for it with enough to do the import non production. I can use DRS to make it a mirror of the production servers current state. I will turn the MTA traces on the non production box. Thanks again

lindborg Sun, 03/01/2009 - 10:06

well if the users aren't there then something more interesting is going on - didn't get that from your first message - sounds like maybe the DB is filling up one of it's spaces or something (Informix doesn't so much grow dynamically on its own) - certainly something going on with the server side.

Doing a test restore sounds good but I doubt very much this is a COBRAS issue and you wont likely find much unless it completely mirrors your production system and you run into the same space issue or something.

Definitely try restoring _one_ user all the way through first without messages. Make sure it gets in. If not, stop and escallate to TAC - there's something going on with the server dbspaces or the like burning you. I'm betting it works fine. It sure sounds like a space issue that then corrups the DB - something for the data folks to eyeball I'd think.

lindborg Sun, 03/01/2009 - 10:24

also, if you have the full import log created by COBRAS from that run, I'd appreciate it you could zip it up and send it my way (lindborg at cisco dot com) - if you happened to gather logs off the Connection server before you restored, that'd be ideal - I want to forward them to the messaging and DB folks to look at if you have them.

Actions

This Discussion