LMS 2.6 to LMS 3.1 migration issues: Episode 2

yjdabear · ‎11-29-2008

1. Error "Unable to schedule backup in /path/to/dir" in LMS 3.1. However, immediate backup is fine. Is this because cron access is disabled to all users? Any workaround to the lack of cron?

2. After importing LMS 2.6 backup in LMS 3.1, a number of phantom devices (some from the past) showed up in DCR and RME. Are they never deleted from the DB (crud from LMS 2.2 and before)?

3. After a hostname change on the LMS 3.1 server, syslog collector status shows both the old and new hostnames in FQDN form, status N/A across the board. Adding new hostname (sans the dot com part) gets it collecting, but the old hostname.fqdn.com refuses to be removed.

After restarting syslogcollector and sysloganalyzer processes, the orignal old/new hostnames.fqdn.com entries return, and both shows N/A across the board again. The "new-hostname.fqdn.com" entry can be unsubscribed, the "old-hostname.fqdn.com" cannot.

/opt/CSCOpx/MDC/tomcat/webapps/rme/WEB-INF/classes/com/cisco/nm/rmeng/csc/data/Subscribers.dat looks ok, so where is old-hostname stored?

\254\355^Esr^Qjava.util.HashMap^E^G\332\301\303^V`\321^C^BF

loadFactorI thresholdxp?@^Lw^H^P^Apsr/com.cisco.nm.rmeng.fcss.common.FcssSub

scription,Z\344^E%\251\202>^B^KI batchSizeI^PdowntimeDurationI^HintervalJ

^OlastUpdatedTimeI^DportI

protocolIdI^SsyslogReceivingPortL collectort^RLjava/lang/String;L^Ldowntim

eFileq~^CL^Bidq~^CL^Fserverq~^Cxpd^AQ\200'^P^A^]\353uT\211^M^E^M^Ept^SDowntimeSy

slogs.logpt^Unew-hostname.fqdn.comx

Somehow I apparently have debug on, but I'm not seeing any clue what's wrong with syslogcollector.

Can I solve the problem by deleting the two *3333 directories in /opt/CSCOpx/MDC/tomcat/webapps/rme/WEB-INF/classes/com/cisco/nm/rmeng/csc/data/?

Collector.properties

filters.dat

Collector.propertiesbkp

new-hostname.fqdn.com_3333

Subscribers.dat

old-hostname.fqdn.com_3333

4. I'm 90% sure whatever causes restorebackup.pl to leave a copy of IPMDB.db in / also gets /var and /var/adm owned by casuser:casusers, which would cause problem for other apps. Is this a known issue?

Joe Clarke · ‎11-30-2008

1. The scheduled backup requires cron. You will need to allow casuser to run cron jobs by removing that username from cron.deny (or adding it to cron.allow).

2. LMS 2.2? When a device was deleted from 2.2 (or 2.6) it was gone. If stale device data is showing up after an upgrade to 3.1, then perhaps the backup is older than you thought, or the previous installation was integrated with ACS, and ACS integration (or ACS itself) has since changed.

3. Shutdown SyslogCollector and SyslogAnalyzer. Then empty the contents of NMSROOT/MDC/tomcat/webapps/rme/WEB-INF/classes/com/cisco/nm/rmeng/sa/data/collectors.dat . Then restart these two daemons. Only the local Collector should show back up.

4. I don't see how either of these two things are possible. As I said previously, I would need to correlate time stamps on the file to log entries to determine where in the code this is happening. But my reading of the code has turned up nothing that would account for this. Have you applied any restore-related patches?

yjdabear · ‎12-01-2008

1. That's what I'm afraid of. So to confirm, scheduled backup (GUI) cannot be made to work with a third-party commercial cron replacement? Is there any plan of having a built-in scheduler in LMS/Common Services?

2. ACS integration has never been attempted. Some phantom devices were decommissioned routers/switches long ago. Some were UNIX/Windows servers on the same subnet (though Campus Manager Device Discovery is off). Others cannot be explained, such as "Badone" and "255.255.255.255". These same "ghosts" appeared on every LMS 3.1 servers after restorebackup. A subset of these appeared before after LMS 2.2->2.6 migration, which lead me to think at least those were crud remaining undeleted in the [CS/RME] DB archive version after version.

3. In a pinch, I renamed the new-hostname.fqdn.com_3333 and old-hostname.fqdn.com_3333 in /opt/CSCOpx/MDC/tomcat/webapps/rme/WEB-INF/classes/com/cisco/nm/rmeng/csc/data/. That got new-hostname to stick and collect, but old-hostname only went away after applying your solution. Is this something that potentially should have been taken care of by the hostnamechange.pl script?

4. I'll need to write a script to watch out for IPMDB.db and the ownership change on /var and /var/adm during a restorebackup. FWIW, even / became owned by casuser:casusers on the test LMS box, but that did not happen on the production LMS servers.

Joe Clarke · ‎12-01-2008

1. No, however, you can manually configure /opt/CSCOpx/bin/backup.pl to be run from your other scheduler.

2. Strange. I have never seen this behavior when moving from 2.6 to 3.0.

3. Probably. However, having both registered should not cause any real problem.

yjdabear · ‎12-01-2008

For 4., I just caught the /var ownership change and /IPMDB.db creation during restorebackup.pl, with +/- 2 secs accuracy. I'll see if I can upload the details here, or open a TAC case.

yjdabear · ‎12-02-2008

/var (and presumably /var/adm) became owned by non-zero UID 30816 at:

Mon Dec 1 15:37:42 EST 2008 var-30816 ***

Corresponding restorebackup.log

...

[Mon Dec 1 15:32:37 2008] rmeng database restored successfully

[Mon Dec 1 15:32:37 2008] Copying the files from the backup

[Mon Dec 1 15:32:37 2008] Copy command is: rm -fr /opt/vgi/ehealth/backups/tempBackupData/rmeng/CSCOp

x/setup/rme.info; cp -pr /opt/vgi/ehealth/backups/tempBackupData/rmeng/CSCOpx/* /opt/CSCOpx;cp -rp /o

pt/vgi/ehealth/backups/tempBackupData/rmeng/var /opt/vgi/ehealth/backups/tempBackupData/rmeng/product

/

[Mon Dec 1 15:37:42 2008] Successfully copied files

[Mon Dec 1 15:37:42 2008] Successfully restored the database

50% of RME Restore completed

Importing CCR Data now...

[Mon Dec 1 15:37:42 2008] Importing CCR Data now...

[Mon Dec 1 15:37:42 2008]

[Mon Dec 1 15:37:48 2008]

REMOVEMDC[CCREntry[RME_SYSLOGANALYZER,,,,,,,,,FALSE]]: CCRResponse[0;]

ADDMDC[CCREntry[RME_SYSLOGANALYZER,,,,EMPTYSTRING,,,,,FALSE]]: CCRResponse[0;MDC added: CCREntry[RME_

SYSLOGANALYZER,,,,EMPTYSTRING,,,,,FALSE].]

REMOVEMDC[CCREntry[RME_CCJS,,,,,,,,,FALSE]]: CCRResponse[0;]

Strangely, migration log didn't start logging until the following time:

###############################################

Started at: ----------> 2008/12/01 15:46:20

01/Dec/2008 15:46:22:865 1 [main] INFO com.cisco.nm.rmeng.migration.MigrMain ? - Constructor: /op

t/CSCOpx

01/Dec/2008 15:46:22:880 16 [main] INFO com.cisco.nm.rmeng.migration.MigrMain ? - Initializing the

setup for migration framework..

01/Dec/2008 15:46:22:882 18 [main] INFO com.cisco.nm.rmeng.migration.MigrMain ? - Check if the Dat

abase engines are up...

01/Dec/2008 15:46:22:981 117 [main] DEBUG com.cisco.nm.rmeng.migration.MigrMain ? - Command is sh /o

pt/CSCOpx/bin/pdshow RMEDbEngine

/IPMDB.db appeared in / at

Mon Dec 1 16:13:20 EST 2008 IPMDB.db ***

Corresponding restorebackup.log

...

Unloading "DBA"."DbVersion"

Unloading "DBA"."DbVersionHistory"

Creating indexes

[Mon Dec 1 16:13:19 2008] Suceed the rebuild ....

[Mon Dec 1 16:13:20 2008] Executing the install command /opt/CSCOpx/bin/perl /opt/CSCOpx/objects/db/c

onf/configureDb.pl action=install dsn=ipmdb

[Mon Dec 1 16:13:21 2008] Successfully installed IPMDB.db...

The closest entries in migration.log:

...

01/Dec/2008 16:07:48:836 1285972 [main] DEBUG com.cisco.nm.rmeng.migration.MigrMain ? - Calling postM

igration() of class: com.cisco.nm.rmeng.migration.CRIDataMigrator@148bd3

01/Dec/2008 16:07:48:836 1285972 [main] DEBUG com.cisco.nm.rmeng.migration.MigrMain ? - Calling postM

igration() of class: com.cisco.nm.rmeng.migration.RMEDataHandler@e80842

Ended at: ----------> 2008/12/01 16:11:59

###############################################

Started at: ----------> 2008/12/01 16:25:21

Please refer to restorebackup.log,ipmclient.log and ipmprocess.log for more detailed log on Migration

0 [main] INFO base - com.cisco.nm.ipmng.migration.MigrMain,,84,Constructor: /opt/CSCOpx

47 [main] INFO base - com.cisco.nm.ipmng.migration.MigrMain,initialize,529,Initializing the setup

for migration framework..

Joe Clarke · ‎12-02-2008

Please attach the complete logs.

Joe Clarke · ‎12-02-2008

Also, what is in /opt/vgi/ehealth/backups/tempBackupData/rmeng/product ? This directory seems to be the one causing the problems, but it's not listed in the manifest as something we backup.

yjdabear · ‎12-02-2008

ls -alR

Tue Dec 2 13:36:39 EST 2008

/opt/vgi/ehealth/backups/tempBackupData/rmeng/product:

total 2

drwxr-xr-x 3 casuser casusers 96 Dec 2 13:32 .

drwxr-x--- 23 casuser casusers 1024 Dec 2 13:35 ..

drwxr-xr-x 2 casuser casusers 96 Dec 2 13:35 CSCO

/opt/vgi/ehealth/backups/tempBackupData/rmeng/product/CSCO:

total 0

drwxr-xr-x 2 casuser casusers 96 Dec 2 13:35 .

drwxr-xr-x 3 casuser casusers 96 Dec 2 13:32 ..

I'll open a TAC and include the full logs there. Should I upload the full lms 2.6 backup being restored too?

Joe Clarke · ‎12-02-2008

Yes.

yjdabear · ‎12-02-2008

TAC case 610204279 opened.

yjdabear · ‎12-01-2008

A new issue: When trying to export the UT endhosts data in CSV from the GUI, a "Native2ascii not found" error is returned.

On the CLI, the following error is encountered:

/opt/CSCOpx/campus/bin/ut -cli -query all -export /path/to/ut.txt -u admin -p admin-passwd

orbProperties={org.omg.CORBA.ORBInitialPort=42342, org.omg.CORBA.ORBClass=org.jacorb.orb.ORB, org.omg.CORBA.ORBInitialHost=new-hostname, org.omg.CORBA.ORBSingletonClass=org.jacorb.orb.ORBSingleton, jacorb.implname=CSEDSPersistentIOR, org.omg.PortableInterceptor.ORBInitializerClass.bidir_init=org.jacorb.orb.giop.BiDirConnectionInitializer}

_Orb=org.jacorb.orb.ORB@4fce71

log4j:ERROR No appenders could be found for category (com.cisco.nm.ani.clients.utng.application.UTDataManager).

log4j:ERROR Please initialize the log4j system properly.

I had to ctrl-c out of it. The output file contains nothing but the above text.

yjdabear · ‎12-01-2008

Nothing unusual in Cmapps.log though.

Martin Ermel · ‎12-01-2008

Hello yjadbear!

I 'got a customer with the same error when using 'ut- cli'.

If I remember well the issue is bound to the option '-query all' and comes up the first time after updating LMS 3.01 to Lms 3.1 (LMS 3.0 was a fresh install, no migration) Could you try to generate the report for only one device instead of all? - Just to clarify if the option '-query all' is involved?

yjdabear · ‎12-01-2008

Martin,

I take that back. The CLI export apparently works. I just needed to be patient and let it finish. It still spits out the "native2ascii" error though, which remains a showstopper only on the browser GUI:

/opt/CSCOpx/campus/bin/ut -cli -query all -export /path/to/ut.txt -layout all -u admin -p admin-password

orbProperties={org.omg.CORBA.ORBInitialPort=42342, org.omg.CORBA.ORBClass=org.jacorb.orb.ORB, org.omg.CORBA.ORBInitialHost=pssva030, org.omg.CORBA.ORBSingletonClass=org.jacorb.orb.ORBSingleton, jacorb.implname=CSEDSPersistentIOR, org.omg.PortableInterceptor.ORBInitializerClass.bidir_init=org.jacorb.orb.giop.BiDirConnectionInitializer}

_Orb=org.jacorb.orb.ORB@13ad085

log4j:ERROR No appenders could be found for category (com.cisco.nm.ani.clients.utng.application.UTDataManager).

log4j:ERROR Please initialize the log4j system properly.

ERROR UTCLI: Error loading preferences. Cannot run program "native2ascii": error=2, No such file or directory