LMS 2.6: Config Management Periodic Collection Failing

Answered Question
Feb 9th, 2010

RME 4.0.5 as part of LMS 2.6 on Windows 2003 SP2.

Our standard monthly config management collection has mysteriously failing for all devices.

The results file shows the following for every single device:

Execution Status  : Job Execution Failed for device

Execution Message : Unable to get results of job execution for device. Retry the job after increasing the job result wait time using the option:Resource Manager Essentials -> Admin -> Config Mgmt -> Archive Mgmt ->Fetch Settings

Whilst the previous value of 60 seconds has been working without any issues for the previous two years, I upped the wait time to 120 second and all this did was double the required poll time.

The actual job log file shows:

Mon Feb 08  10:45:04 EST 2010 ],INFO ,[Thread-6],com.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecThread,handleMultiDeviceExecution,143,JobExecutorThread - MultiDeviceExec DcmaJobExecThread 0 : Running

[ Mon Feb 08  10:45:04 EST 2010 ],INFO ,[Thread-6],com.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecThread,syncArchive,653,Syncing Archive, # of devices = 752

[ Mon Feb 08  10:45:05 EST 2010 ],INFO ,[Thread-6],com.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecThread,handleMultiDeviceExecution,149,Completed executeJob(), updating Results

[ Mon Feb 08  10:45:05 EST 2010 ],INFO ,[Thread-6],com.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecThread,getNumCyclesToPoll,1018,getNumCyclesToPoll Function Started.

[ Mon Feb 08  10:45:05 EST 2010 ],INFO ,[Thread-6],com.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecThread,updateMultiDeviceExecResults,781,Awaiting Job results: req Id = 1265586304881 Poll time = 1321 min(s)

[ Tue Feb 09  08:46:22 EST 2010 ],INFO ,[Thread-6],com.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecThread,setDevResultToFailure,897,Could not get results for 752 device(s)

[ Tue Feb 09  08:46:25 EST 2010 ],INFO ,[Thread-6],com.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecThread,handleMultiDeviceExecution,166,Thread DcmaJobExecThread 0: Stopping

[ Tue Feb 09  08:46:26 EST 2010 ],INFO ,[main],com.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecutor,run,221,Finished Job Execution

[ Tue Feb 09  08:46:26 EST 2010 ],INFO ,[main],com.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecutor,endJobExecution,460,Writing result Files

Historically, when this worked fine, the log file would show the devices as they completed and update the number of remaining devices. The job is still configured for parallel execution, I’m debating setting it to sequential just to see if makes any difference.

From what I can see our syslog or periiodic polling triggered collections are working fine.

Any suggestions out there from someone who may have seen something similar?

I have this problem too.
0 votes
Correct Answer by Joe Clarke about 6 years 11 months ago

One of the ConfigMgmtServer threads maybe getting deadlocked.  Try restarting ConfigMgmtServer, and see if subsequent jobs run correctly:

pdterm ConfigMgmtServer

pdexec ConfigMgmtServer

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (1 ratings)
Loading.
Correct Answer
Joe Clarke Tue, 02/09/2010 - 20:58

One of the ConfigMgmtServer threads maybe getting deadlocked.  Try restarting ConfigMgmtServer, and see if subsequent jobs run correctly:

pdterm ConfigMgmtServer

pdexec ConfigMgmtServer

Actions

This Discussion

Related Content