/opt/CSCOpx/bin/dbstop.pl

Unanswered Question
Feb 26th, 2007

Is the following error of any concern, for LMS 2.2?

Can't open perl script "/opt/CSCOpx/bin/dbstop.pl": No such file or directory

I can't find this file anywhere.

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (1 ratings)
Loading.
Joe Clarke Mon, 02/26/2007 - 11:23

Yes, this is a big problem. Without this script, stopping dmgtd can lead to database corruption. The fact that it is missing indicates someone accidentally deleted it, or you have a bad installation.

yjdabear Mon, 02/26/2007 - 11:33

The box kernel-panicked while the LMS 2.2 DST patch was being installed. After reboot, the web gui is prompting to install JRE 1.4.2_12. Can I simply copy dbstop.pl from another box? Should I reinitdb even if I don't see sign of db corruption yet?

Joe Clarke Mon, 02/26/2007 - 11:37

Copying the file may work, but there may be other file system problems. I wouldn't reinitialize any thing at this point, but if you start to see other strange problems, especially with missing files, you might consider restoring application components from a file system backup, or doing a reinstallation.

yjdabear Mon, 02/26/2007 - 11:58

I did a test run of dmdtd stop/start after fetching dbstop.pl from the test box. Upon dmgtd stop, the following db processes never went down:

root 17069 1 0 14:52:46 ? 0:00 /opt/CSCOpx/objects/dmgt/dmgtd.sol

casuser 14747 1 0 14:23:31 ? 1:12 /product/CSCO/CSCOpx/objects/db/bin/dbsrv7 -x tcpip{HOST=localhost;DOBROADCAST=

casuser 14730 1 0 14:23:15 ? 0:05 /product/CSCO/CSCOpx/objects/db/bin/dbsrv7 -x tcpip{HOST=localhost;DOBROADCAST=

casuser 17070 17069 1 14:52:46 ? 0:00 /opt/CSCOpx/objects/ess/bin/rvrd -store /opt/CSCOpx/objects/ess/conf/rvrd.conf

Without running "dbstop.pl all", daemons.log keeps spitting out "A database server with that name has already started" upon "dmgtd start". Analyze ANI Server fails as well.

Will a reinstall of the patches below address the db issue?

cw-common-services-2.2-sp3-sol-K9.tar.gz

cwcs2.2-sol-CSCsc30604-K9.tar.gz

cwcs2.2-sol-CSCse127781-k9.tar

cwcs2.2-sol-CSCsg58592-K9.tar.gz

cwdst2_2_3_sol_k9.zip

Joe Clarke Mon, 02/26/2007 - 12:01

Right. The databases are stopped using that script to prevent a dirty shutdown leading to corruption. If you kill off those processes, you risk damaging the data. So, you WILL need that script. Either you copy it from another good system, restore it from a file system backup, or do a reinstallation.

Obviously, I'm being very paranoid in this case. A reinstallation may not be needed. It may be just this file that is missing. However, my experience tells me to be safe rather than sorry.

yjdabear Mon, 02/26/2007 - 12:04

Just to clarify, I did the "dmgtd stop" after copying dbstop.pl over. It seems "dmgtd stop" did not call upon "dbstop.pl stop" as if it's not aware of the latter's presence, although it no longer complained of missing "dbstop.pl" in daemons.log. Is there something I can do to make this awareness happen?

Here's the output of dbstop.pl all

/product/CSCO/CSCOpx/bin/dbstop.pl all

Stopping database engine aniEng

Stopping database engine rmeEng

There are still active connections. Stop anyway? (Y/N) Y

Stopping database engine cmfEng

Stopping database engine SqlCoreDBServer

Could not find DSN=mcp.

Joe Clarke Mon, 02/26/2007 - 12:49

Prior to running dbstop.pl all, what is the output from the following command:

/usr/bin/ps -ef | /usr/bin/fgrep /product/CSCO/CSCOpx | /usr/bin/fgrep -v fgrep | /usr/bin/fgrep dbsrv | /usr/bin/awk '{ print $2; }'

It might also be useful to get the output of:

sh -x /etc/init.d/dmgtd stop

yjdabear Mon, 02/26/2007 - 13:49

In a pinch I went ahead with /opt/CSCOpx/campus/bin/reinitdb.pl -restore. That seems to keep ANI Server up, whereas before Analyze ANI Server was erroring out/testing ok/erroring out again.

The first command returns three PIDs, the same ones before or after "dmgtd stop".

21967

22046

22032

Attached is the screen capture of sh -x dmgtd stop, after a server reboot.

Attachment: 
Joe Clarke Mon, 02/26/2007 - 13:59

Interesting. You may have found a bug, or something more is messed up on your system. Since you have installed into a non-default location, NMSROOT should point to that location. Instead, it's being found to be /opt/CSCOpx. This isn't necessarily bad (unless the /opt/CSCOpx symlink is missing), but since the processes are preceded with the real root, dmgtd cannot find them, and thus it doesn't know to stop them.

I assume this wasn't happening before the panic? Do you have a good file system backup of this server?

yjdabear Tue, 02/27/2007 - 06:45

The symlink /opt/CSCOpx pointing to /product/CSCO/CSCOpx appears fine.

lrwxrwxrwx 1 casuser casusers 20 Jul 9 2004 CSCOpx -> /product/CSCO/CSCOpx

I'm sure the server can be restored if needed. What file systems would need to be restored?

/product/CSCO/CSCOpx (/opt/CSCOpx) and /var/Logs/CSCO/adm/CSCOpx (/var/adm/CSCOpx) are on SAN, while /etc/init.d/dmgtd is on local disk. /var/adm/CSCOpx (local) was a symlink to /var/Logs/CSCO/adm/CSCOpx (SAN), but now LMS is writing to it, which probably started when the 108993-61 OS patch called for booting into single-use mode. However, I'm still trying to wrap my mind around how dmgtd could've started in single-user mode, finding no /var/Logs/CSCO/adm/CSCOpx, therefore overwriting the symlink /var/adm/CSCOpx with a real directory.

I suspect that sequence may also have something to with the following that just trickled in: Apparently there were some errors, when installing cw-common-services-2.2-sp3-sol-K9, after 108993-61 got installed:

Hmmm, now I wonder if the whole db mess is because the db is confused because some of the files it expects are in /var/Logs/CSCO/adm/CSCOpx/files which it can no longer find, while the most recent ones are in /var/adm/CSCOpx/files.

Attachment: 
Joe Clarke Tue, 02/27/2007 - 09:10

I was more interested in differences between /opt/CSCOpx/objects/dmgt/dmgtd.conf from the backup and from the active system. Also, what are the differences in the /var/sadm/pkg/CSCOmd/pkginfo in the backup and on the active system?

The fact that SP3 failed to apply in the CSCOmd package could be very bad. What does pkgchk CSCOmd report? The database problem right now doesn't have to do with logs, but with the fact that the dbsrv7 processes are not being stopped when dmgtd is brought down. Since they continue to run, dmgtd cannot spawn new engines.

yjdabear Tue, 02/27/2007 - 10:15

diff pkginfo-backup pkginfo

< LD_LIBRARY_PATH=/product/CSCO/CSCOpx/objects/db/lib:/product/CSCO/CSCOpx/lib:/opt/CSCOpx/MDC/lib:/opt/CSCOpx/campus/lib:/opt/CSCOpx/MDC/lib:/opt/CSCOpx/MDC/lib

---

prod box (not behaving)

> LD_LIBRARY_PATH=/product/CSCO/CSCOpx/objects/db/lib:/product/CSCO/CSCOpx/lib:/opt/CSCOpx/MDC/lib:/opt/CSCOpx/campus/lib:/opt/CSCOpx/MDC/lib

diff dmgtd.conf-backup dmgtd.conf

23c23

< CampusOGSServer y CmfDbMonitor,ESS /product/CSCO/CSCOpx/bin/cwjava -cp:a /product/CSCO/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/classes:/product/CSCO/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/lib/ctm.jar:/product/CSCO/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/lib/ogs-server1.0.jar:/product/CSCO/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/lib/ogs-sharedasa1.0.jar:/product/CSCO/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/lib/ogs-sqlasa1.0.jar:/product/CSCO/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/lib/ogs-util1.0.jar:/product/CSCO/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/lib/ogs-client1.0.jar:/product/CSCO/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/lib/ogs-cmasa1.0.jar:/product/CSCO/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/lib/ogs-cmclient1.0.jar:/product/CSCO/CSCOpx/MDC/tomcat/lib/apps/xerces.jar:/product/CSCO/CSCOpx/MDC/tomcat/lib/apps/log4j.jar:/product/CSCO/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/lib/jconnect5.2.jar com.cisco.nm.xms.ogs.server.OGSServer

---

> CampusOGSServer y CmfDbMonitor,ESS /opt/CSCOpx/bin/cwjava -cp:a /opt/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/classes:/opt/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/lib/ctm.jar:/opt/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/lib/ogs-server1.0.jar:/opt/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/lib/ogs-sharedasa1.0.jar:/opt/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/lib/ogs-sqlasa1.0.jar:/opt/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/lib/ogs-util1.0.jar:/opt/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/lib/ogs-client1.0.jar:/opt/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/lib/ogs-cmasa1.0.jar:/opt/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/lib/ogs-cmclient1.0.jar:/opt/CSCOpx/MDC/tomcat/lib/apps/xerces.jar:/opt/CSCOpx/MDC/tomcat/lib/apps/log4j.jar:/opt/CSCOpx/MDC/tomcat/webapps/campus/WEB-INF/lib/jconnect5.2.jar com.cisco.nm.xms.ogs.server.OGSServer

34d33

< ANIDbEngine y - /product/CSCO/CSCOpx/objects/db/bin/dbsrv7 -x tcpip{HOST=localhost;DOBROADCAST=NO;ServerPort=43443} -q -s local0 -m -ti 0 -gm 100 -gc 5 -c 8M -ht -gss 9900 -n aniEng /opt/CSCOpx/databases/ani/ani.db -n aniDb

42a42

> ANIDbEngine y - /product/CSCO/CSCOpx/objects/db/bin/dbsrv7 -x tcpip{HOST=localhost;DOBROADCAST=NO;ServerPort=43443} -q -s local0 -m -ti 0 -gm 100 -gc 5 -c 8M -ht -gss 9900 -n aniEng /opt/CSCOpx/databases/ani/ani.db -n aniDb

lms-backup# pkgchk CSCOmd

ERROR: /tmp/.SQLAnywhere

permissions <0770> expected <0777> actual

ERROR: /var/adm/CSCOpx/log

permissions <0750> expected <0755> actual

group name expected <(null)> actual

ERROR: /var/adm/CSCOpx

file type expected actual

lms# pkgchk CSCOmd

ERROR: /product/CSCO/CSCOpx/etc/install.cshrc

file size <268> expected <232> actual

file cksum <22005> expected <18913> actual

ERROR: /product/CSCO/CSCOpx/etc/install.profile

file size <209> expected <182> actual

file cksum <17013> expected <14694> actual

ERROR: /tmp/.SQLAnywhere

permissions <0770> expected <0777> actual

ERROR: /var/adm/CSCOpx

group name expected actual

owner name expected actual

What I mean is since it's mentioned before (by Nadim?) that it's not a good idea to delete files direclty from /var/adm/CSCOpx/files, the fact right now

/var/Logs/CSCO/adm/CSCOpx/files is not symbolically linked with /var/adm/CSCOpx/files might be an issue since the db can't find these files.

Joe Clarke Tue, 02/27/2007 - 10:46

You should fix the pkgchk errors on /product/CSCO/CSCOpx/etc/install.profile and install.cshrc (this will most likely mean restoring good versions of these files). As for the symbolic link, this will cause problems, but not THIS problem. Consequently, we do not support symbolic links in this fashion. It will generally work, but patch installations may overwrite the symbolic link with an actual directory.

It does seem like you are hitting a bug in dmgtd. This bug has been fixed in LMS 2.6. For now, you could change your dmgtd script on lines 202, 221, and 227 so that the fgrep for NMSROOT is actually:

/usr/bin/egrep '('\$NMSROOT'|'/product/CSCO/CSCOpx')'

Or just be sure to run dbstop.pl all as well as a kill -TERM on all casuser processes after shutting down dmgtd.

Martin Ermel Tue, 02/27/2007 - 11:08

Hello Joe,

Could you please tell me to which application belongs the

DSN=mcp

I think I?ve never seen this DSN on an LMS installation

from a previous post in this thread:

####

Here's the output of dbstop.pl all

/product/CSCO/CSCOpx/bin/dbstop.pl all

Stopping database engine aniEng

Stopping database engine rmeEng

There are still active connections. Stop anyway? (Y/N) Y

Stopping database engine cmfEng

Stopping database engine SqlCoreDBServer

Could not find DSN=mcp.

#####

And do you know if a symlinc on /var/adm/CSCOpx will be supported in the future (version or date ..)

I also know some installtions with this problem...

regards,

MArtin

Joe Clarke Tue, 02/27/2007 - 11:36

mcp is part of Core, and not used by LMS. It is part of VMS. If you don't have VMS, then you don't need to worry about this.

Symbolic linking is not on our roadmap. We do, however, support moving a lot of the information (like configs and SWIM images) to arbitrary locations on the file system.

yjdabear Tue, 02/27/2007 - 11:21

Hah, I don't know how I missed the diff between /product/CSCO/CSCOpx/etc/install.profile and install.cshrc with the other box when I first looked at them.

We're speculating that the LMS patches were installed either in single-user mode or there were more kernel panics than the one during the DST patch, so /var/sadm (on SAN) wasn't mounted when CS22-SP3 went int, which caused the pkgadd to blow away the /var/adm/CSCOpx symlink.

A question, must the LMS 2.2 DST patch be reinstalled if any other patch is applied after it?

Thanks for all your help.

Joe Clarke Tue, 02/27/2007 - 11:33

No. The DST patch is unique, and subsequent patches will not overwrite it (or they will explicitly say re-application is required).

yjdabear Tue, 02/27/2007 - 12:31

Do you think of these patches need to be re-applied?

cw-common-services-2.2-sp3-sol-K9.tar.gz

cwcs2.2-sol-CSCsc30604-K9.tar.gz

cwcs2.2-sol-CSCse127781-k9.tar

cwcs2.2-sol-CSCsg58592-K9.tar.gz

The dbsrv processes are still not shutting down with dmgtd stop after I modified the three aforementioned files:

/product/CSCO/CSCOpx/etc/install.profile

/product/CSCO/CSCOpx/etc/install.cshrc

/etc/init.d/dmgtd

tail daemons.log

...

Cache size adjusted to 54736K

Performance warning: no unique index or primary key for table "CampusUserGroupAs

sociationTable"

Stopping database engine aniEng

There are still active connections. Stop anyway? (Y/N) 2007-02-27 15:29:39 RV: T

IB/Rendezvous Error Not Handled by Process:

{ADV_CLASS="WARN" ADV_SOURCE="SYSTEM" ADV_NAME="RVD.DISCONNECTED"}

TIB/Rendezvous daemon

Copyright 1994-2002 by TIBCO Software Inc.

All rights reserved.

Version 7.0.21

Stopping database engine cmfEng

There are still active connections. Stop anyway? (Y/N) Stopping database eng

ine rmeEng

There are still active connections. Stop anyway? (Y/N) Stopping database eng

ine SqlCoreDBServer

Received signal (15)

1098757The SQL execution was successful during 1 try[Tue Feb 27 15:30:50 2007] [

diskWatcher] Exiting on signal :15

WARNING: dmgtd exiting now: (thread:1 sig:15)

INFO >>>>>>>>>>>>>>>>>>>>>>>>>>>>> REACTOR thread (tid=1) exited; rc=1.

Joe Clarke Tue, 02/27/2007 - 12:49

The DST patch may need to be reapplied. It doesn't seem to have completed successfully.

Actions

This Discussion