cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1749
Views
20
Helpful
31
Replies

Cannot stop Device Discovery Process, CS 3.2

s.leyer
Level 1
Level 1

Hello!

The Device Discovery Process doesn't end properly anymore. When I try to stop the process manually (Device Discovery Summary or CLI), the following error message is displayed:

"Exception in thread "main" com.cisco.nm.csdiscovery.CSDiscoveryException: Excpetion while unpublishing discovery urn. CTMRegistryClient::deleteURNEntry() : "CSDiscovery" not present

at com.cisco.nm.csdiscovery.util.DiscoveryUtil.unPublishURN(DiscoveryUtil.java:1230)

at com.cisco.nm.csdiscovery.CSDiscoveryManager.stopDiscovery(CSDiscoveryManager.java:111)

at com.cisco.nm.csdiscovery.DiscoveryCLI.main(DiscoveryCLI.java:346)

I can only stop the process with the Job Browser but since the process was canceled, new devices are not discovered.

We are using LMS 3.1.0, CS 3.2.0 on Solaris.

31 Replies 31

Joe Clarke
Cisco Employee
Cisco Employee

Please post the output of the pdshow command.

Here it is.

Run the command:

pdterm CSDiscovery

That should stop the current Discovery, and allow you to start another one from the GUI.

The command stopped the process. But when I start the Discovery again from the GUI, the process hangs up after discovering a quarter of our devices without any error message in the logs. After pushing the stop-button, the message "Unable to stop the running the discovery instance" is displayed. Details of CSDiscovery.log are in the attached file. Only pdterm stopps the process.

It looks like Discovery many be crashing at some point. If you enable Discovery Framework debugging under Common Services > Device and Credentials > Device Discovery > Discovery Logging Configuration, then re-run Discovery, and reproduce the problem, the ngdiscovery.log should have some additional errors.

Good hint. Previously I only enabled debugging under Server -> Admin and I was wondering why there was so few output.

Ok, I did as you said and I also made a device update. After that the Discovery Process didn't hang up anymore but it took 4 1/2 hours to discover 538 reachable and 162 unreachable devices and more than 530 devices were updated in the DCR. I've started the process twice and got the same behaviour.

I'm still wondering because before the problem appeared, the process took less than 1 hour and, if I remember well, it didn't update each device in the DCR.

Well, I'll have a closer look to the logs.

What are your configured SNMP timeouts and retries? It is a common mistake of customers to set these to high values not realizing how the code works. The time take grows exponentially when a failing device is encountered. Take for example the config of 2 retries with a 10 second timeout. When an unreachable device is encountered, the first attempt will wait 10 seconds before a timeout. The second attempt will wait 20 seconds. The third attempt will wait 40 seconds. So, for one unreachable device, Discovery has waited 70 seconds. If you multiply this by 162, you get an extra 3.19 hours spent doing nothing.

Because of that, I recommend one configures no more than 1 retry with a 6 second timeout. For 162 unreachable devices, this would still add 48 extra minutes. This also demonstrates how important it is to fix these unreachable devices (or filter them out).

also consider the debugging itself.

I know, that with debugging enabled you can extend discovery from hours to eternity. I do not know if this is caused if you enable a certain module for discovery or only if you enable all (or at least many) modules.

If the snmp settings are as jclarke mentioned, then just disable debugging and give it a new try.

Hello! Thanks for your comments, jclarke and mermel.

SNMP timeouts and retries had default values. More than 100 unreachable devices are small non-Cisco switches with CDP capability. After filtering them out, the process took 3 instead of 4 1/2 hours.

I disabled debugging und started the process again and, here we go again, the process hung up and I was unable to stop it from the GUI.

And again: with enabled debugging the process finished after several hours, after disabling debugging the process hung up. Funny ...

As the process always hangs up after the same number of discovered devices, I'll try to filter out several devices. May be one's misconfigured.

When the process hangs, it is possible to get a full thread dump which should reveal while it is blocked. This procedure is not straight forward, so if you can't track down a bad device, you should open a TAC service request, and have them walk you through the steps.

We experienced the same problem but fixed it by moving the seed devices from the global options to the CDP module in the discovery settings. CDP module was empty (no seed devices) but the checkbox for using the DCR as seed list doesn't seem to work.

After some tests we found that enabling the Discovery Framework debug the Device Discovery runs succesful (545 elements found). But when disabled (as it should) it stops at 23 devices and "hangs" / still runs and cannot be stopped.

Have you opened a TAC service request yet? As I said, the full thread dump would be extremely useful in narrowing down why Discovery is locking up.

Indeed, I didn't do it yet. In November I tuned the duration of the discovery process by excluding serveral IP ranges without Cisco devices. Then I was too busy to open a service request and I postponed it. Finally I forgot and debugging is still enabled.

Next week I'll be at the Cisco Networkers. After that I'll try to work it out with the TAC.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Innovations in Cisco Full Stack Observability - A new webinar from Cisco