I'm after a reboot on LMS 2.6 server, the CM home page started showing the following message: "Cannot connect to ANI Server since it is down". I followed the procedures as in the thread "Ciscoworks ANI server cannot connect, Joe Clark please help", but, in the end, after I restarted yhe Damon Manager, all seervices came up, except for the ANI database engine. After I started this serrvice manually, it came up, but the ANI.log file reappeared and the message on the CM home became "Please wait... ANIServer is still initializing."
The result for the pdreg command is below:
C:\>pdreg -l ANIDbEngine
Process = ANIDbEngine
Path = C:\PROGRA~1\CSCOpx\objects\db\win32\dbsrv9.exe
Startup = Started automatically at boot.
Dependencies = Not applicable
The result for the pdshow command is attached.
Hope Joe Clark or anyone can help me to.
Thanks in advance.
You cannot start the ANIDbEngine service manually. Reboot the server, and if the database does not come back up, then post the output of the pdshow command.
Hi, thank you for the reply.
I rebooted the server and the database came up, but the CM Home page still shows the same message: "Please wait... ANIServer is still initializing". I waited for an hour and no change. The ani.log file is there, but I cannot read it. The pdshow exit is attached.
It is getting worse. When I try to launch RME home, a message says "License Server/Daemon Manager is down. Please check license.log for more information."
I think something is preventing the services to come up.
There are no process problems reflected in the pdshow. All of the required daemons are up. Post the ani.log and the NMSROOT/MDC/tomcat/logs/stdout.log and stderr.log.
Have you tried restarting the daemon manager? this happened to me before and after restarting the daemon everything seems to be fine.
The disk space problem is solved, but now, besides the ANI server problem, RME Engine does not start. The Windows Event Viewer shows the pollowing message: "Cannot open transaction log file -- No such file or directory". Is there any way to recover these services?
The log files for the RME are attached.
Your transaction logs are corrupt. See this document on how you can recover them:
The information will be found under "Recovering from a Server Crash."
Yes, the server was renamed accordingly to the procedure in http://www.cisco.com/en/US/docs/net_mgmt/ciscoworks_common_services_software/3.0.5/user/guide/diagnos.html#wp1078582. But it worked fine for some time before this problem.
The jrm.log is attached.
Here it is:
C:\>pdreg -l CTMJrmServer
Process = CTMJrmServer
Path = C:\PROGRA~1\CSCOpx\bin\cwjava.exe
Flags = -cw C:\PROGRA~1\CSCOpx -cw:jre lib/jre -cp:p MDC/tomcat/w
Startup = Started automatically at boot.
Dependencies = RMEDbMonitor jrm TomcatMonitor RMECSTMServer
It is correct:
Is it possible to restore a backup only for the JRM, without loosing DCR or RME information?
It looks like there might be a timing issue. Shutdown Daemon Manager, then empty out the contents of the jrm.log. Restart Daemon Manager, then when pdshow returns valid data, post that output along with the new jrm.log.
Please find attached the jrm.log, the old jrm.log (as jrm.old.log) and the pdshow output.
After restarting the Daemon Manager, the processe sm_server.exe started consuming a great amount of CPU time. This has happened before and a new restart in Daemon Manager solved it. Do you think this is somehow related to the jrm problem?
There is no jrm problem according to this pdshow and this log. In fact, I see no process problem at all. As for sm_server, it is expected for these processes to take a lot of CPU time on restart. These are the DFMServer processes that have to rebuild the internal topology of the network. They will also take considerable CPU time if you are sending lots of traps to LMS, or if they are currently doing polling of the devices.
DFM has a very particular profile when it comes to scalability. You need to make sure you are not managing too many interfaces/ports and that are you are not sending too many traps to DFM.
For the record:
I restored a backup on a new LMS instalation and the problem persisted. I had no more time and, so, I reinitialized the database and started everything form zero manually. Now, it is working fine.
Regarding the DFM scalability, I have only 1600 devices, do you think this number could be too much?
Thank you for the help.
This command asks for a DFM User and none of the users I created when installing LMS worked. Is there anyway I can retrieve or reset this credentials?
C:\>"c:\Program Files\CSCOpx\objects\smarts\bin\sm_tpmgr" -s DFM --sizes
Server DFM User: admin
admi's Password: XXXXXXXXXXXX
[23-Jul-2010 9:44:29 AM+836ms] t@11124
ASL-E-ERROR_INIT_BACKEND-While initializing server connection to 'DFM'
SM-EREFUSED-No process is connected to the specified location
Although I have installed a fresh LMSinstalation, after rediscovering all devices, the new installation of LMS is showing the same error in RME homepage: "JRM Service could be down. Check whether JRM services are running.".
By the pdshow results, I can see JRM is still "Waiting to initialize" since last fryday. The sm_server process is no longer live and the CPU activity is low righnow. The pdshow results and jrm.log are attached.
Do you think DFM is causing this problem? My great concerns are RME and CM, so I could reduce DFM functionality, or even disable it, if necessary.
Jrm is taking way too long to start. It is only allowed 30 seconds, but it looks like it's taking over four minutes. Assuming the server is idle, you can try doing:
To see if it starts. If it does, it could be that the server is too busy during Daemon Manager start time, and the CPU load is preventing jrm from starting correctly. What are the specs of this server?
Before your last message, I've disabled all DFM polling and SNMP trap receiving. After that, the RME home started working normally and I was able to manage jobs once again. You were right, the server coudn't handle all the traps and the polling, but even after disabling those, the CM home still shows the JRM down message. I tried stopping and restarting the JRM, like you suggested, but the problem persisted.
The system is a Windows 2003 on a VMWare ESXi. The hardware is a dual Xeon with 4GB of dedicated RAM to the LMS server. I don't have the clock informations now, but I can provide it tomorrow.
P.S.: Is there any clean way to prevent DFM from processing SNMP traps? For test purpooses, I changed the SNMP port, but it makes Windows to generate lots of ICMP port unreachable packets and I don't intend to leave it this way.
Unfortunately, VMWare is not supported in LMS 2.6. That could be contributing to your performance issues. If you need to use VMWare, you will have to upgrade to LMS 3.x. The only way to stop DFM from processing the traps is to stop sending them to DFM. You can change your device configs to only send traps which DFM understands. See http://www.cisco.com/en/US/partner/docs/net_mgmt/ciscoworks_device_fault_manager/2.0_IDU_2.0.6/user/guide/TrapFwd.html for the list.