02-09-2010 07:39 PM
RME 4.0.5 as part of LMS 2.6 on Windows 2003 SP2.
Our standard monthly config management collection has mysteriously failing for all devices.
The results file shows the following for every single device:
Execution Status : Job Execution Failed for device
Execution Message : Unable to get results of job execution for device. Retry the job after increasing the job result wait time using the option:Resource Manager Essentials -> Admin -> Config Mgmt -> Archive Mgmt ->Fetch Settings
Whilst the previous value of 60 seconds has been working without any issues for the previous two years, I upped the wait time to 120 second and all this did was double the required poll time.
The actual job log file shows:
Mon Feb 08 10:45:04 EST 2010 ],INFO ,[Thread-6],com.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecThread,handleMultiDeviceExecution,143,JobExecutorThread - MultiDeviceExec DcmaJobExecThread 0 : Running
[ Mon Feb 08 10:45:04 EST 2010 ],INFO ,[Thread-6],com.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecThread,syncArchive,653,Syncing Archive, # of devices = 752
[ Mon Feb 08 10:45:05 EST 2010 ],INFO ,[Thread-6],com.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecThread,handleMultiDeviceExecution,149,Completed executeJob(), updating Results
[ Mon Feb 08 10:45:05 EST 2010 ],INFO ,[Thread-6],com.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecThread,getNumCyclesToPoll,1018,getNumCyclesToPoll Function Started.
[ Mon Feb 08 10:45:05 EST 2010 ],INFO ,[Thread-6],com.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecThread,updateMultiDeviceExecResults,781,Awaiting Job results: req Id = 1265586304881 Poll time = 1321 min(s)
[ Tue Feb 09 08:46:22 EST 2010 ],INFO ,[Thread-6],com.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecThread,setDevResultToFailure,897,Could not get results for 752 device(s)
[ Tue Feb 09 08:46:25 EST 2010 ],INFO ,[Thread-6],com.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecThread,handleMultiDeviceExecution,166,Thread DcmaJobExecThread 0: Stopping
[ Tue Feb 09 08:46:26 EST 2010 ],INFO ,[main],com.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecutor,run,221,Finished Job Execution
[ Tue Feb 09 08:46:26 EST 2010 ],INFO ,[main],com.cisco.nm.rmeng.dcma.jobdriver.DcmaJobExecutor,endJobExecution,460,Writing result Files
Historically, when this worked fine, the log file would show the devices as they completed and update the number of remaining devices. The job is still configured for parallel execution, I’m debating setting it to sequential just to see if makes any difference.
From what I can see our syslog or periiodic polling triggered collections are working fine.
Any suggestions out there from someone who may have seen something similar?
Solved! Go to Solution.
02-09-2010 08:58 PM
One of the ConfigMgmtServer threads maybe getting deadlocked. Try restarting ConfigMgmtServer, and see if subsequent jobs run correctly:
pdterm ConfigMgmtServer
pdexec ConfigMgmtServer
02-09-2010 08:58 PM
One of the ConfigMgmtServer threads maybe getting deadlocked. Try restarting ConfigMgmtServer, and see if subsequent jobs run correctly:
pdterm ConfigMgmtServer
pdexec ConfigMgmtServer
02-11-2010 04:44 PM
Thanks, Joe. That is exactly what fixed it.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide