cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1188
Views
0
Helpful
10
Replies

System Inventory collection and Inventory Collection jobs failed

yjdabear
VIP Alumni
VIP Alumni

Just noticed Inventory Changes being 0, which never happens. Looks like the two inventory collections have been working. What's wrong with them?

System Inventory Collection

/var/adm/CSCOpx/files/rme/jobs/ICServer/1391/

Inventory Collection

/var/adm/CSCOpx/files/rme/jobs/ICServer/1594/

10 Replies 10

Joe Clarke
Cisco Employee
Cisco Employee

Looks like some of your RME daemons may have crashed. The daemons to check are RMECSTMServer and ICServer.

Seems fine in pdshow. Should I restart ICServer?

Process= RMECSTMServer

State = Running normally

Pid = 20654

RC = 0

Signo = 0

Start = 01/25/08 15:26:26

Stop = Not applicable

Core = Not applicable

Info = RMECSTMServer started.

Process= ICServer

State = Administrator has shut down this server

Pid = 0

RC = 1

Signo = 0

Start = 01/25/08 15:26:30

Stop = 01/26/08 01:23:52

Core = Not applicable

Info = ICServer started.

Yes, but you should check the daemons.log (ICServer.log on Windows) for any indication of why it crashed in the first place. Note: a pdexec might not fix this. You may have to restart dmgtd.

Seems a number of tables were locked.

While these errors would prevent inventory from being successfully collected, they would not crash ICServer. Additionally, if you have a process which is stuck, and holding the locks on these tables, you will definitely need to restart dmgtd to recover.

there is a 'java.lang.OutOfMemoryError' at the very end in line 7987 which I think forced ICServer to exit:

[ Sat Jan 26 01:23:49 EST 2008 ],FATAL,[Thread-18],com.cisco.nm.rmeng.inventory.ics.server.InvDataProcessor,481,Fatal Error has Occured, exiting ICServer java.lang.OutOfMemoryError

but why did it occur? Could it be the process locks the tables?

there is a 'java.lang.OutOfMemoryError' at the very end in line 7987 which I think forced ICServer to exit:

[ Sat Jan 26 01:23:49 EST 2008 ],FATAL,[Thread-18],com.cisco.nm.rmeng.inventory.ics.server.InvDataProcessor,481,Fatal Error has Occured, exiting ICServer java.lang.OutOfMemoryError

but why did it occur? Could it be the process locks the tables?

Yeah, one of the threads hit that error, then it exited. I doubt the locks caused this. If you look, the thread that encountered the OOME did not encounter the lock problem. But there does appear to be an issue with the 192.168.8.44 device. It takes 355 seconds to process this device, and there could be a problem in the CISCO-STACK-MIB implementation. It would be beneficial to look at a sniffer trace of the inventory collection for this device to rule out any bugs on the device side.

Yes, the thread that incountered the OOME did not encounter the lock problem, but if I interprete the log correct it has yet finished processing and was in a state of just giving the last information about its runtime.

perhaps it is a more widely spread problem... :-(

If you say that 355 is a long time for processing a device, there are several devices for which it takes longer (up to 876 sec). But as I see, they all (except the one with the OOME) finished processing (a few are showing the lock prbl also). Could it be, that for some of these devices the memory does not get properly freed?

It could be of interest if they are all of the same device type...

yjdabear, perhaps this list is somewhat useful for you....

it contains the IPs with processing time > 300s

172.19.10.1

172.19.10.74

172.19.20.102

172.19.20.111

172.19.20.212

172.19.20.232

172.19.25.1 (842s)

172.19.26.2

172.19.29.1

172.19.3.1

172.19.32.1

172.19.42.3

192.168.11.28

192.168.110.71

192.168.116.28

192.168.254.29

192.168.254.30

192.168.26.20

192.168.26.36

192.168.26.44

192.168.28.4 (DP time:863s, Total time: 876s)

192.168.29.12

192.168.29.36

192.168.3.36

192.168.32.44

192.168.37.36 (DP time: 793, Total time: 854s)

192.168.4.28

192.168.5.12

192.168.52.36

192.168.53.76

192.168.8.36

192.168.8.44

The reason this device was interesting to me is that it also had an SNMP access error in it. However, given network latency, size of device, etc. 355 seconds may not be that long. That's why I suggested a sniffer trace to rule out a problem with device instrumentation.

All that said, it could be that there is a memory leak that is encountered by this thread. This would not be the first time that we've seen an ICServer leak. Profiling ICServer is not an easy task, though, so it would be good to rule out obvious problems first.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: