11-06-2008 12:37 AM
Hello!
The Device Discovery Process doesn't end properly anymore. When I try to stop the process manually (Device Discovery Summary or CLI), the following error message is displayed:
"Exception in thread "main" com.cisco.nm.csdiscovery.CSDiscoveryException: Excpetion while unpublishing discovery urn. CTMRegistryClient::deleteURNEntry() : "CSDiscovery" not present
at com.cisco.nm.csdiscovery.util.DiscoveryUtil.unPublishURN(DiscoveryUtil.java:1230)
at com.cisco.nm.csdiscovery.CSDiscoveryManager.stopDiscovery(CSDiscoveryManager.java:111)
at com.cisco.nm.csdiscovery.DiscoveryCLI.main(DiscoveryCLI.java:346)
I can only stop the process with the Job Browser but since the process was canceled, new devices are not discovered.
We are using LMS 3.1.0, CS 3.2.0 on Solaris.
11-06-2008 07:38 AM
Please post the output of the pdshow command.
11-07-2008 02:33 AM
11-07-2008 09:44 AM
Run the command:
pdterm CSDiscovery
That should stop the current Discovery, and allow you to start another one from the GUI.
11-10-2008 03:55 AM
The command stopped the process. But when I start the Discovery again from the GUI, the process hangs up after discovering a quarter of our devices without any error message in the logs. After pushing the stop-button, the message "Unable to stop the running the discovery instance" is displayed. Details of CSDiscovery.log are in the attached file. Only pdterm stopps the process.
11-10-2008 10:05 AM
It looks like Discovery many be crashing at some point. If you enable Discovery Framework debugging under Common Services > Device and Credentials > Device Discovery > Discovery Logging Configuration, then re-run Discovery, and reproduce the problem, the ngdiscovery.log should have some additional errors.
11-11-2008 10:40 AM
Good hint. Previously I only enabled debugging under Server -> Admin and I was wondering why there was so few output.
Ok, I did as you said and I also made a device update. After that the Discovery Process didn't hang up anymore but it took 4 1/2 hours to discover 538 reachable and 162 unreachable devices and more than 530 devices were updated in the DCR. I've started the process twice and got the same behaviour.
I'm still wondering because before the problem appeared, the process took less than 1 hour and, if I remember well, it didn't update each device in the DCR.
Well, I'll have a closer look to the logs.
11-11-2008 10:49 AM
What are your configured SNMP timeouts and retries? It is a common mistake of customers to set these to high values not realizing how the code works. The time take grows exponentially when a failing device is encountered. Take for example the config of 2 retries with a 10 second timeout. When an unreachable device is encountered, the first attempt will wait 10 seconds before a timeout. The second attempt will wait 20 seconds. The third attempt will wait 40 seconds. So, for one unreachable device, Discovery has waited 70 seconds. If you multiply this by 162, you get an extra 3.19 hours spent doing nothing.
Because of that, I recommend one configures no more than 1 retry with a 6 second timeout. For 162 unreachable devices, this would still add 48 extra minutes. This also demonstrates how important it is to fix these unreachable devices (or filter them out).
11-11-2008 12:00 PM
also consider the debugging itself.
I know, that with debugging enabled you can extend discovery from hours to eternity. I do not know if this is caused if you enable a certain module for discovery or only if you enable all (or at least many) modules.
If the snmp settings are as jclarke mentioned, then just disable debugging and give it a new try.
11-12-2008 11:46 AM
Hello! Thanks for your comments, jclarke and mermel.
SNMP timeouts and retries had default values. More than 100 unreachable devices are small non-Cisco switches with CDP capability. After filtering them out, the process took 3 instead of 4 1/2 hours.
I disabled debugging und started the process again and, here we go again, the process hung up and I was unable to stop it from the GUI.
And again: with enabled debugging the process finished after several hours, after disabling debugging the process hung up. Funny ...
As the process always hangs up after the same number of discovered devices, I'll try to filter out several devices. May be one's misconfigured.
11-12-2008 11:53 AM
When the process hangs, it is possible to get a full thread dump which should reveal while it is blocked. This procedure is not straight forward, so if you can't track down a bad device, you should open a TAC service request, and have them walk you through the steps.
01-15-2009 02:27 AM
We experienced the same problem but fixed it by moving the seed devices from the global options to the CDP module in the discovery settings. CDP module was empty (no seed devices) but the checkbox for using the DCR as seed list doesn't seem to work.
01-23-2009 05:58 AM
After some tests we found that enabling the Discovery Framework debug the Device Discovery runs succesful (545 elements found). But when disabled (as it should) it stops at 23 devices and "hangs" / still runs and cannot be stopped.
01-23-2009 12:18 PM
Have you opened a TAC service request yet? As I said, the full thread dump would be extremely useful in narrowing down why Discovery is locking up.
01-24-2009 12:58 PM
Indeed, I didn't do it yet. In November I tuned the duration of the discovery process by excluding serveral IP ranges without Cisco devices. Then I was too busy to open a service request and I postponed it. Finally I forgot and debugging is still enabled.
Next week I'll be at the Cisco Networkers. After that I'll try to work it out with the TAC.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide