RME 4.x: Config Mgmt - Archive Mgmt trouble

Unanswered Question
Apr 26th, 2007
User Badges:
  • Gold, 750 points or more

From the very beginning after upgrading to LMS 2.6, my RME->Archive Mgmt->Configuration Archival Summary shows a total of only ~25% of the total Normal Devices of RME inventory. This is affirmed by a Device Status Report, showing Inventory Collection Status (100% success) and Config Collection Status side by side (~25% success, with the rest without status info).


So far I've attempted to remedy this situation by scheduling jobs at the following locations:


RME->Archive Mgmt->Sync Archive, with "Poll device before configuration" and "Fetch startup config" checked


and


RME->Admin->Config Mgmt->Archive Mgmt->Collection Settings-> both Periodic Polling and Periodic Collection


(Are these two means accomplishing the same thing)?


The ad-hoc Sync Archive job has failed after nearly a day on all the devices, with the same error:


*** Device Details for rtr65131 ***

Protocol ==> Unknown / Not Applicable

Unable to get results of job execution for device. Retry the job after increasing the job result wait time using the option:Resource Manager Essentials -> Admin -> Config Mgmt -> Archive Mgmt ->Fetch Settings

CLI Output

No CLI Output for this job

********************


The Periodic Polling and Periodic Collection jobs are both still running after a full day, with every single device "pending", so I suspect they too will all fail like the Sync Archive job. These jobs can't be stopped either.


Any advice on where to go from here?

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 4 (2 ratings)
Loading.
Joe Clarke Thu, 04/26/2007 - 10:13
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

First, stop dmgtd to kill the jobs. Then restart dmgtd, and enable ArchiveMgmt/Archive Mgmt Service debugging along with ConfigJob debugging under RME > Admin > System Preferences > Loglevel Settings. Then run an ad hoc sync job for ONE failed device (rtr65131 would be good). If that fails, that gives a small sample to analyze. The interesting logs will be dcmaservice.log and the job directory pertaining to this job ID.


If the job succeeds, keep the same debugging on, and do a big ad hoc job for all failed devices. The same logs will be of interest.

yjdabear Thu, 04/26/2007 - 10:36
User Badges:
  • Gold, 750 points or more

Forgot to say I scheduled a smaller (7 devices) Sync Archive job (#1219), then I remembered to turn on debug for Archive Mgmt. All seven are "pending" after nearly two hours. "tail -f dcmaservice.log" seems to show something looping for quite a while:

...

[ Thu Apr 26 14:25:05 EDT 2007 ],DEBUG,[Thread-48],com.cisco.nm.rmeng.dcma.configmanager.ConfigManager,getStatusOfRequest,457,Total devices due for reqId = 1177574115776 is 410

[ Thu Apr 26 14:25:08 EDT 2007 ],DEBUG,[Thread-50],com.cisco.nm.rmeng.dcma.configmanager.ConfigManager,getStatusOfRequest,457,Total devices due for reqId = 1177553728060 is 410

[ Thu Apr 26 14:25:09 EDT 2007 ],DEBUG,[Thread-49],com.cisco.nm.rmeng.dcma.configmanager.ConfigManager,getStatusOfRequest,457,Total devices due for reqId = 1177606714417 is 7

[ Thu Apr 26 14:25:10 EDT 2007 ],DEBUG,[Thread-48],com.cisco.nm.rmeng.dcma.configmanager.ConfigManager,getStatusOfRequest,457,Total devices due for reqId = 1177574115776 is 410

[ Thu Apr 26 14:25:13 EDT 2007 ],DEBUG,[Thread-50],com.cisco.nm.rmeng.dcma.configmanager.ConfigManager,getStatusOfRequest,457,Total devices due for reqId = 1177553728060 is 410


Here's "/var/adm/CSCOpx/files/rme/jobs/ArchiveMgmt/1219/log" in its current entirety:

[ Thu Apr 26 12:58:30 EDT 2007 ],INFO ,[main],com.cisco.nm.rmeng.config.ccjs.ex

ecutor.CfgJobExecutor,getJobExecutionImpl,160, Executor implementation class com

.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecutor

[ Thu Apr 26 12:58:30 EDT 2007 ],INFO ,[main],com.cisco.nm.rmeng.dcma.jobdriver

.DcmaJobExecutor,setJobInfo,167,DcmaJobExecutor: Initializing 1219

[ Thu Apr 26 12:58:32 EDT 2007 ],INFO ,[main],com.cisco.nm.rmeng.config.ccjs.ex

ecutor.CfgJobExecutor,,119,Job listener for daemon manager messages starte

d

[ Thu Apr 26 12:58:32 EDT 2007 ],INFO ,[1219],com.cisco.nm.rmeng.config.ccjs.ex

ecutor.dmgtJobRunner,run,31, DMGT Job Listener running..

[ Thu Apr 26 12:58:32 EDT 2007 ],INFO ,[main],com.cisco.nm.rmeng.dcma.jobdriver

.DcmaJobExecutor,initJobPolicies,583,Notification policy is ON, recipients : [email protected]

[ Thu Apr 26 12:58:32 EDT 2007 ],INFO ,[main],com.cisco.nm.rmeng.dcma.jobdriver

.DcmaJobExecutor,initJobPolicies,603,Execution policy is : Parallel Execution

[ Thu Apr 26 12:58:32 EDT 2007 ],INFO ,[main],com.cisco.nm.rmeng.dcma.jobdriver

.DcmaJobExecutor,initJobPolicies,626,Num Threads = 1, Task = Sync Archive, Num D

evices = 7

[ Thu Apr 26 12:58:32 EDT 2007 ],INFO ,[Thread-4],com.cisco.nm.rmeng.config.ccj

s.executor.CfgJobExecutionESSListener,run,55,ESS Message listener started

[ Thu Apr 26 12:58:33 EDT 2007 ],INFO ,[main],com.cisco.nm.rmeng.dcma.jobdriver

.DcmaJobExecutor,run,212,Job initialization complete, starting execution

[ Thu Apr 26 12:58:33 EDT 2007 ],INFO ,[main],com.cisco.nm.rmeng.dcma.jobdriver

.DcmaJobExecThread,,113,Constructing ExecutorThread DcmaJobExecThread 0

[ Thu Apr 26 12:58:34 EDT 2007 ],INFO ,[Thread-5],com.cisco.nm.rmeng.dcma.jobdr

iver.DcmaJobExecThread,handleMultiDeviceExecution,143,JobExecutorThread - MultiD

eviceExec DcmaJobExecThread 0 : Running

[ Thu Apr 26 12:58:34 EDT 2007 ],INFO ,[Thread-5],com.cisco.nm.rmeng.dcma.jobdr

iver.DcmaJobExecThread,syncArchive,653,Syncing Archive, # of devices = 7

[ Thu Apr 26 12:58:34 EDT 2007 ],INFO ,[Thread-5],com.cisco.nm.rmeng.dcma.jobdr

iver.DcmaJobExecThread,handleMultiDeviceExecution,149,Completed executeJob(), up

dating Results

[ Thu Apr 26 12:58:34 EDT 2007 ],INFO ,[Thread-5],com.cisco.nm.rmeng.dcma.jobdr

iver.DcmaJobExecThread,getNumCyclesToPoll,1018,getNumCyclesToPoll Function Start

ed.

[ Thu Apr 26 12:58:34 EDT 2007 ],INFO ,[Thread-5],com.cisco.nm.rmeng.dcma.jobdr

iver.DcmaJobExecThread,updateMultiDeviceExecResults,781,Awaiting Job results: re

q Id = 1177606714417 Poll time = 339 min(s)

Joe Clarke Thu, 04/26/2007 - 10:39
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

This is what I'd expect to see at this point. ConfigJob debugging is off, and the job has already been running for a while. None of this is helpful. If you have not restarted anything, and this job is still running, performing a kill -QUIT on the PID of ConfigMgmtServer, then getting the resulting thread dump from daemons.log might be useful.

yjdabear Thu, 04/26/2007 - 11:14
User Badges:
  • Gold, 750 points or more

This is bizarre. After dmgtd stop/start, I picked two from the top of the list (one CatOS, one IOS) and one from the bottom (CatOS) that were among the failed, which was 100%. This job finished successfully in a blink of the eye.


The same set of seven (six CatOS, one IOS) that failed earlier succeeded this time as well, in two mins.


The numbers at Configuration Archival Summary are incrementing accordingly too.


Trying the entire RME inventory next.

yjdabear Thu, 04/26/2007 - 13:26
User Badges:
  • Gold, 750 points or more

The job finished fine against the entire RME inventory (with some "Partially Successful" attributed to the inability to retrieve VLAN configs with TFTP/SCP). Would you care to share what you think was causing the previous failure(s)?


Will the Periodic Polling and Periodic Collection equate to what the Sync Archive job does?

Joe Clarke Thu, 04/26/2007 - 13:46
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

Without seeing more details from the logs, I cannot say what was happening before.


Periodic polling plus config fetch is exactly the same as running an ad hoc sync archive job against all devices.

vhamwvbarnaj Fri, 09/07/2007 - 11:21
User Badges:

HI jclarke,


I am recieving the same error message for any device that I try to collect or archive a config on. I have performed the steps you have above without any help.


I did turn on the debug on and listed below is a sample of the results. I did not inlcude all the lines because they were all the same.


[ Fri Sep 07 13:13:52 EDT 2007 ],DEBUG,[Thread-3],com.cisco.nm.rmeng.util.DCRWrapperAPIs,getResultFromQuery,2838,Query: select DCR_ID from DM_Dev_State where NetworkElementID = '83'

[ Fri Sep 07 13:13:52 EDT 2007 ],DEBUG,[Thread-3],com.cisco.nm.rmeng.util.DCRWrapperAPIs,getDCRId,1082,dev id: 83 dcr id : 83

[ Fri Sep 07 13:13:52 EDT 2007 ],DEBUG,[Thread-3],com.cisco.nm.rmeng.util.DCRWrapperAPIs,getAPIExtraInfo,2753,InheritableThreadLocal value not set: Setting it

[ Fri Sep 07 13:13:52 EDT 2007 ],DEBUG,[Thread-3],com.cisco.nm.rmeng.util.DCRWrapperAPIs,initAPIXtraInfo,2514, Source context set to: vhamwvnms3/10.40.106.57__DCRWrapperAPIs__4.0

[ Fri Sep 07 13:13:52 EDT 2007 ],DEBUG,[Thread-3],com.cisco.nm.rmeng.util.DCRWrapperAPIs,initAPIXtraInfo,2537, Security context set to: admin

[ Fri Sep 07 13:13:52 EDT 2007 ],DEBUG,[Thread-3],com.cisco.nm.rmeng.util.DCRWrapperAPIs,getDisplayNameForDev,849,dev id: 83Disp name : mwv5001a106sw01

[ Fri Sep 07 13:13:52 EDT 2007 ],DEBUG,[Thread-3],com.cisco.nm.rmeng.dcma.configmanager.CfgThreadManager,triggerConfigFetch,40,Config fetch request received

[ Fri Sep 07 13:13:57 EDT 2007 ],DEBUG,[Thread-111],com.cisco.nm.rmeng.dcma.configmanager.ConfigManager,getStatusOfRequest,913,Total devices due for reqId = 1189185232687 is 1

[ Fri Sep 07 14:10:03 EDT 2007 ],DEBUG,[Thread-5],com.cisco.nm.rmeng.dcma.configmanager.ConfigManager,getStatusOfRequest,913,Total devices due for reqId = 1189185232687 is 1


Thanks for the help.

Jason


Actions

This Discussion