LMS2.6 problem archiving configuration

Answered Question
Apr 4th, 2007

A fresh Windows install of LMS2.6 at a customer has a problem archiving the configurations of our IOS devices. This does seem to workd fine for the few CatOS boxes we still have. The IOS boxes are registered as partially successful, with the following error message:

CM0057 PRIMARY RUNNING Config fetch SUCCESS, archival failed for gv-kgr-02-sw01 Cause: CM0002: Could not archive config Cause: Device may not be reachable, may be in suspended state or credentials may be incorrect. Action: Verify that device is managed, credentials are correct and file system has correct permissions. Increase timeout value, if required. Action: Verify that archive exists for device.

This looks a lot like quite a few problem posted in this forum, but removing/adding the devices, as mentioned as a possible workaround, does not solve the problem. jclark reported work was being done on a fix, is there any ETA on that?

Additionally, we are seeing some stability issues, with LMS services being shut down for no apparent reason after a few hours to a couple of days running. Services are stopped with "Administrator has stopped this server". So far I've had not much luck deciphering the log messages, to find out why those services stopped.

Any help or suggestions are appreciated.

I have this problem too.
0 votes
Correct Answer by Joe Clarke about 9 years 8 months ago

Your problem is your System Identity User is missing. Make sure the username configured under Common Services > Server > Security > System Identity Setup is a valid local user with full CiscoWorks roles (e.g. admin).

After correcting the System Identity User, restart dmgtd.

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (2 ratings)
Loading.
Joe Clarke Wed, 04/04/2007 - 08:00

This config archive error may not require any fixes. It may be a misconfiguration on the server. You should enable ArchiveMgmt Service debugging under RME > Admin > System Preferences > Loglevel Settings, then reproduce the problem. The dcmaservice.log should have errors as to why the archive is failing.

Which daemons are in the shutdown state? What other errors are you seeing?

ijdod Wed, 04/04/2007 - 22:53

See attached dmcaclient.log file. Started debugging, ran an immediate Synch Conf job (job #1045), job failed (same error as in original post), stopped debugging, copy&paste to file for attachement.

Haven't written the stopped services down, will do so next time. It was a whole bunch.

Other issues we noted:

- Some of the default scheduled jobs fail because they have no owner.

- Trying to set the collection job schedule (RME > Admin > CFG Mgmt > Collection settings) results in a CM0076 error, but the CTMJrmServer & jrm ARE running. See attachment collection_job.log for the debugging output on that one.

Attachment: 
Joe Clarke Wed, 04/04/2007 - 23:04

I had mentioned the dcmaservice.log, not the dcmaclient.log. They are substantially different. However, it looks like you have bigger problems. Jrm appears to be down. I assume these problems are new? What changed on the server recently (e.g. was the hostname changed?)? A full pdshow output may help isolate the failing processes.

ijdod Thu, 04/05/2007 - 00:16

Ouch, my bad. Sorted on date, never read past dmca, and went from there. Looking at the dmcaserver.log I do see some errors regarding 'user null or empty'. I have no idea which user is ment. Device credentials check out in CDA, as well as in the ACS logging (ACS is just used for device authentication at this point, no integration with CW).

Jrm is up as far as I can tell ("Running normally"), bot through the common service GUI and through pdshow, no recent changes to the server. Server itself is a fresh install, just like CW itself.

The problem is 'new' in the sense that I didn't see this before, but I don't think I tried setting that before. LMS is new to us, production server is still an ancient CW2000 with the old style webGUI.

Attachment: 
Correct Answer
Joe Clarke Thu, 04/05/2007 - 06:52

Your problem is your System Identity User is missing. Make sure the username configured under Common Services > Server > Security > System Identity Setup is a valid local user with full CiscoWorks roles (e.g. admin).

After correcting the System Identity User, restart dmgtd.

ijdod Wed, 04/11/2007 - 22:04

Downed daemons last incident, with timestamp. No engineers were actually working with the system when this happened, we only found out later when certain parts on the GUI turned out to be unresponsive. All daemons were 'Administrator has shut down this server'.

FHServer 10:30:11

FHDbMonitor 10:31:11

CampusOGSServer 10:32:51-53

ChangeAudit

CmfDbMonitor

CMFOGSServer

ConfigMgmtServer

CTMJrmServer

DCRServer

DFMCTMStartup

EDS-GCF

Interactor

InvDBMonitor

InventoryCollector

jrm

NCTemplateMgr

NetShowMgr

NOSServer

PTMServer

RMEOGSServer

SyslogAnalyzer

TISServer

EssentialsDM

Joe Clarke Thu, 04/12/2007 - 04:48

The root of the problem appears to be centered around the CMF database. What is the state of CmfDbEngine? Is this Windows or Solaris?

ijdod Thu, 04/12/2007 - 05:35

CmfDbEngine was very likely up and running, the list was of services stopped on or around that time.

It's a Win2003 SP1 install.

ijdod Thu, 04/12/2007 - 07:19

It's a problem in the sense that LMS seems to die without reasons known to us. There is a workaround by restarting the whole shebang.

Joe Clarke Thu, 04/12/2007 - 08:02

I assume you've fixed the problem with the System Identity User. If not, go ahead and do that now, then restart LMS. When the problem occurs again, it would be a good idea to get the CmfDbMonitor.log to see why the connection to the database died.

Actions

This Discussion