02-08-2008 12:38 PM
Just noticed Inventory Changes being 0, which never happens. Looks like the two inventory collections have been working. What's wrong with them?
System Inventory Collection
/var/adm/CSCOpx/files/rme/jobs/ICServer/1391/
Inventory Collection
/var/adm/CSCOpx/files/rme/jobs/ICServer/1594/
02-08-2008 10:29 PM
Looks like some of your RME daemons may have crashed. The daemons to check are RMECSTMServer and ICServer.
02-12-2008 06:04 AM
Seems fine in pdshow. Should I restart ICServer?
Process= RMECSTMServer
State = Running normally
Pid = 20654
RC = 0
Signo = 0
Start = 01/25/08 15:26:26
Stop = Not applicable
Core = Not applicable
Info = RMECSTMServer started.
Process= ICServer
State = Administrator has shut down this server
Pid = 0
RC = 1
Signo = 0
Start = 01/25/08 15:26:30
Stop = 01/26/08 01:23:52
Core = Not applicable
Info = ICServer started.
02-12-2008 08:21 AM
Yes, but you should check the daemons.log (ICServer.log on Windows) for any indication of why it crashed in the first place. Note: a pdexec might not fix this. You may have to restart dmgtd.
02-12-2008 07:39 AM
02-12-2008 08:23 AM
While these errors would prevent inventory from being successfully collected, they would not crash ICServer. Additionally, if you have a process which is stuck, and holding the locks on these tables, you will definitely need to restart dmgtd to recover.
02-12-2008 09:13 AM
there is a 'java.lang.OutOfMemoryError' at the very end in line 7987 which I think forced ICServer to exit:
[ Sat Jan 26 01:23:49 EST 2008 ],FATAL,[Thread-18],com.cisco.nm.rmeng.inventory.ics.server.InvDataProcessor,481,Fatal Error has Occured, exiting ICServer java.lang.OutOfMemoryError
but why did it occur? Could it be the process locks the tables?
02-12-2008 09:45 AM
there is a 'java.lang.OutOfMemoryError' at the very end in line 7987 which I think forced ICServer to exit:
[ Sat Jan 26 01:23:49 EST 2008 ],FATAL,[Thread-18],com.cisco.nm.rmeng.inventory.ics.server.InvDataProcessor,481,Fatal Error has Occured, exiting ICServer java.lang.OutOfMemoryError
but why did it occur? Could it be the process locks the tables?
02-12-2008 10:07 AM
Yeah, one of the threads hit that error, then it exited. I doubt the locks caused this. If you look, the thread that encountered the OOME did not encounter the lock problem. But there does appear to be an issue with the 192.168.8.44 device. It takes 355 seconds to process this device, and there could be a problem in the CISCO-STACK-MIB implementation. It would be beneficial to look at a sniffer trace of the inventory collection for this device to rule out any bugs on the device side.
02-13-2008 03:36 AM
Yes, the thread that incountered the OOME did not encounter the lock problem, but if I interprete the log correct it has yet finished processing and was in a state of just giving the last information about its runtime.
perhaps it is a more widely spread problem... :-(
If you say that 355 is a long time for processing a device, there are several devices for which it takes longer (up to 876 sec). But as I see, they all (except the one with the OOME) finished processing (a few are showing the lock prbl also). Could it be, that for some of these devices the memory does not get properly freed?
It could be of interest if they are all of the same device type...
yjdabear, perhaps this list is somewhat useful for you....
it contains the IPs with processing time > 300s
172.19.10.1
172.19.10.74
172.19.20.102
172.19.20.111
172.19.20.212
172.19.20.232
172.19.25.1 (842s)
172.19.26.2
172.19.29.1
172.19.3.1
172.19.32.1
172.19.42.3
192.168.11.28
192.168.110.71
192.168.116.28
192.168.254.29
192.168.254.30
192.168.26.20
192.168.26.36
192.168.26.44
192.168.28.4 (DP time:863s, Total time: 876s)
192.168.29.12
192.168.29.36
192.168.3.36
192.168.32.44
192.168.37.36 (DP time: 793, Total time: 854s)
192.168.4.28
192.168.5.12
192.168.52.36
192.168.53.76
192.168.8.36
192.168.8.44
02-13-2008 09:53 AM
The reason this device was interesting to me is that it also had an SNMP access error in it. However, given network latency, size of device, etc. 355 seconds may not be that long. That's why I suggested a sniffer trace to rule out a problem with device instrumentation.
All that said, it could be that there is a memory leak that is encountered by this thread. This would not be the first time that we've seen an ICServer leak. Profiling ICServer is not an easy task, though, so it would be good to rule out obvious problems first.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: