04-04-2007 01:05 AM
A fresh Windows install of LMS2.6 at a customer has a problem archiving the configurations of our IOS devices. This does seem to workd fine for the few CatOS boxes we still have. The IOS boxes are registered as partially successful, with the following error message:
CM0057 PRIMARY RUNNING Config fetch SUCCESS, archival failed for gv-kgr-02-sw01 Cause: CM0002: Could not archive config Cause: Device may not be reachable, may be in suspended state or credentials may be incorrect. Action: Verify that device is managed, credentials are correct and file system has correct permissions. Increase timeout value, if required. Action: Verify that archive exists for device.
This looks a lot like quite a few problem posted in this forum, but removing/adding the devices, as mentioned as a possible workaround, does not solve the problem. jclark reported work was being done on a fix, is there any ETA on that?
Additionally, we are seeing some stability issues, with LMS services being shut down for no apparent reason after a few hours to a couple of days running. Services are stopped with "Administrator has stopped this server". So far I've had not much luck deciphering the log messages, to find out why those services stopped.
Any help or suggestions are appreciated.
Solved! Go to Solution.
04-05-2007 06:52 AM
Your problem is your System Identity User is missing. Make sure the username configured under Common Services > Server > Security > System Identity Setup is a valid local user with full CiscoWorks roles (e.g. admin).
After correcting the System Identity User, restart dmgtd.
04-04-2007 08:00 AM
This config archive error may not require any fixes. It may be a misconfiguration on the server. You should enable ArchiveMgmt Service debugging under RME > Admin > System Preferences > Loglevel Settings, then reproduce the problem. The dcmaservice.log should have errors as to why the archive is failing.
Which daemons are in the shutdown state? What other errors are you seeing?
04-04-2007 10:53 PM
See attached dmcaclient.log file. Started debugging, ran an immediate Synch Conf job (job #1045), job failed (same error as in original post), stopped debugging, copy&paste to file for attachement.
Haven't written the stopped services down, will do so next time. It was a whole bunch.
Other issues we noted:
- Some of the default scheduled jobs fail because they have no owner.
- Trying to set the collection job schedule (RME > Admin > CFG Mgmt > Collection settings) results in a CM0076 error, but the CTMJrmServer & jrm ARE running. See attachment collection_job.log for the debugging output on that one.
04-04-2007 11:04 PM
I had mentioned the dcmaservice.log, not the dcmaclient.log. They are substantially different. However, it looks like you have bigger problems. Jrm appears to be down. I assume these problems are new? What changed on the server recently (e.g. was the hostname changed?)? A full pdshow output may help isolate the failing processes.
04-05-2007 12:16 AM
Ouch, my bad. Sorted on date, never read past dmca, and went from there. Looking at the dmcaserver.log I do see some errors regarding 'user null or empty'. I have no idea which user is ment. Device credentials check out in CDA, as well as in the ACS logging (ACS is just used for device authentication at this point, no integration with CW).
Jrm is up as far as I can tell ("Running normally"), bot through the common service GUI and through pdshow, no recent changes to the server. Server itself is a fresh install, just like CW itself.
The problem is 'new' in the sense that I didn't see this before, but I don't think I tried setting that before. LMS is new to us, production server is still an ancient CW2000 with the old style webGUI.
04-05-2007 06:52 AM
Your problem is your System Identity User is missing. Make sure the username configured under Common Services > Server > Security > System Identity Setup is a valid local user with full CiscoWorks roles (e.g. admin).
After correcting the System Identity User, restart dmgtd.
04-11-2007 10:04 PM
Downed daemons last incident, with timestamp. No engineers were actually working with the system when this happened, we only found out later when certain parts on the GUI turned out to be unresponsive. All daemons were 'Administrator has shut down this server'.
FHServer 10:30:11
FHDbMonitor 10:31:11
CampusOGSServer 10:32:51-53
ChangeAudit
CmfDbMonitor
CMFOGSServer
ConfigMgmtServer
CTMJrmServer
DCRServer
DFMCTMStartup
EDS-GCF
Interactor
InvDBMonitor
InventoryCollector
jrm
NCTemplateMgr
NetShowMgr
NOSServer
PTMServer
RMEOGSServer
SyslogAnalyzer
TISServer
EssentialsDM
04-12-2007 04:48 AM
The root of the problem appears to be centered around the CMF database. What is the state of CmfDbEngine? Is this Windows or Solaris?
04-12-2007 05:35 AM
CmfDbEngine was very likely up and running, the list was of services stopped on or around that time.
It's a Win2003 SP1 install.
04-12-2007 05:38 AM
So is this no longer a problem?
04-12-2007 07:19 AM
It's a problem in the sense that LMS seems to die without reasons known to us. There is a workaround by restarting the whole shebang.
04-12-2007 08:02 AM
I assume you've fixed the problem with the System Identity User. If not, go ahead and do that now, then restart LMS. When the problem occurs again, it would be a good idea to get the CmfDbMonitor.log to see why the connection to the database died.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide