Cannot Connect to ANIserver

Unanswered Question
Jul 5th, 2010
User Badges:

Hi.


I'm after a reboot on LMS 2.6 server, the CM home page started showing the following message: "Cannot connect to ANI Server since it is down". I followed the procedures as in the thread "Ciscoworks ANI server cannot connect, Joe Clark please help", but, in the end, after I restarted yhe Damon Manager, all seervices came up, except for the ANI database engine. After I started this serrvice manually, it came up, but the ANI.log file reappeared and the message on the CM home became "Please wait... ANIServer is still initializing."


The result for the pdreg command is below:

C:\>pdreg -l ANIDbEngine
        Process      = ANIDbEngine
        Path         = C:\PROGRA~1\CSCOpx\objects\db\win32\dbsrv9.exe
        Flags        =
        Startup      = Started automatically at boot.
        Dependencies = Not applicable


The result for the pdshow command is attached.


Hope Joe Clark or anyone can help me to.


Thanks in advance.

Attachment: 
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Joe Clarke Mon, 07/05/2010 - 16:29
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

You cannot start the ANIDbEngine service manually.  Reboot the server, and if the database does not come back up, then post the output of the pdshow command.

rodrigopitta Tue, 07/06/2010 - 06:57
User Badges:

Hi, thank you for the reply.


I rebooted the server and the database came up, but the CM Home page still shows the same message: "Please wait... ANIServer is still initializing". I waited for an hour and no change. The ani.log file is there, but I cannot read it. The pdshow exit is attached.

Attachment: 
rodrigopitta Tue, 07/06/2010 - 13:22
User Badges:

It is getting worse. When I try to launch RME home, a message says "License Server/Daemon Manager is down. Please check license.log for more information."


I think something is preventing the services to come up.

Joe Clarke Tue, 07/06/2010 - 18:35
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

There are no process problems reflected in the pdshow.  All of the required daemons are up.  Post the ani.log and the NMSROOT/MDC/tomcat/logs/stdout.log and stderr.log.

mylacapanzana Tue, 07/06/2010 - 22:54
User Badges:

Have you tried restarting the daemon manager? this happened to me before and after restarting the daemon everything seems to be fine.

rodrigopitta Wed, 07/07/2010 - 12:47
User Badges:

Yes. I've restarted Daemon Manager anda e

nven rebooted the server. No changes.

rodrigopitta Wed, 07/07/2010 - 12:54
User Badges:

I'm running out of disk space, maybe that is the cause of the problems. Please find attached the stdout and the log for the databases.

Attachment: 
rodrigopitta Thu, 07/08/2010 - 13:55
User Badges:

The disk space problem is solved, but now, besides the ANI server problem, RME Engine does not start. The Windows Event Viewer shows the pollowing message: "Cannot open transaction log file -- No such file or directory". Is there any way to recover these services?


The log files for the  RME are attached.


Thanks.

Attachment: 
rodrigopitta Fri, 07/09/2010 - 07:23
User Badges:

Thank you. Now RME engine comes up, but the JRM service still does not. CM Home says "Please wait.... ANIServer is still initializing " and RME home says "JRM Service could be down. Check whether JRM services are running".

The pdshow is attached.

Attachment: 
Joe Clarke Sat, 07/10/2010 - 14:25
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

Post the jrm.log.  Was the hostname on this server changed recently?

Joe Clarke Mon, 07/12/2010 - 08:16
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

Post the outputs of the hostname command and the "pdreg -l CTMJrmServer".

rodrigopitta Mon, 07/12/2010 - 08:41
User Badges:

Here it is:



C:\>pdreg -l CTMJrmServer

        Process      = CTMJrmServer

        Path         = C:\PROGRA~1\CSCOpx\bin\cwjava.exe

        Flags        = -cw C:\PROGRA~1\CSCOpx -cw:jre lib/jre -cp:p MDC/tomcat/w

ebapps/rme/WEB-INF/classes,MDC/tomcat/webapps/rme/WEB-INF/lib/ctm.jar,MDC/tomcat

/webapps/rme/WEB-INF/lib/log4j.jar com.cisco.nm.rmeng.jrmwrapper.server.CTMJobMa

nagerServer cwsctx03rjorp

        Startup      = Started automatically at boot.

        Dependencies = RMEDbMonitor jrm TomcatMonitor RMECSTMServer

Joe Clarke Mon, 07/12/2010 - 13:44
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

What about the output of the HOSTNAME command from the DOS shell?

rodrigopitta Tue, 07/13/2010 - 05:52
User Badges:

It is correct:



C:\>hostname

cwsctx03rjorp


Is it possible to restore a backup only for the JRM, without loosing DCR or RME information?

Joe Clarke Tue, 07/13/2010 - 14:27
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

It looks like there might be a timing issue.  Shutdown Daemon Manager, then empty out the contents of the jrm.log.  Restart Daemon Manager, then when pdshow returns valid data, post that output along with the new jrm.log.

rodrigopitta Wed, 07/14/2010 - 06:36
User Badges:

Please find attached the jrm.log, the old jrm.log (as jrm.old.log) and the pdshow output.

After restarting the Daemon Manager, the processe sm_server.exe started consuming a great amount of CPU time. This has happened before and a new restart in Daemon Manager solved it. Do you think this is somehow related to the jrm problem?

Attachment: 
Joe Clarke Thu, 07/15/2010 - 19:47
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

There is no jrm problem according to this pdshow and this log.  In fact, I see no process problem at all.  As for sm_server, it is expected for these processes to take a lot of CPU time on restart.  These are the DFMServer processes that have to rebuild the internal topology of the network.  They will also take considerable CPU time if you are sending lots of traps to LMS, or if they are currently doing polling of the devices.


DFM has a very particular profile when it comes to scalability.  You need to make sure you are not managing too many interfaces/ports and that are you are not sending too many traps to DFM.

rodrigopitta Thu, 07/22/2010 - 06:18
User Badges:

For the record:

I restored a backup on a new LMS instalation and the problem persisted. I had no more time and, so, I reinitialized the database and started everything form zero manually. Now, it is working fine. 

Regarding the DFM scalability, I have only 1600 devices, do you think this number could be too much?


Thank you for the help.

Joe Clarke Thu, 07/22/2010 - 22:14
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

Possibly.  Post the output of the following command:


NMSROOT/objects/smarts/bin/sm_tpmgr -s DFM --sizes

rodrigopitta Fri, 07/23/2010 - 06:03
User Badges:

This command asks for a DFM User and none of the users I created when installing LMS worked.  Is there anyway I can retrieve or reset this credentials?



C:\>"c:\Program Files\CSCOpx\objects\smarts\bin\sm_tpmgr" -s DFM --sizes

Server DFM User: admin

admi's Password: XXXXXXXXXXXX

[23-Jul-2010 9:44:29 AM+836ms] [email protected]

ASL-E-ERROR_INIT_BACKEND-While initializing server connection to 'DFM'

SM-EREFUSED-No process is connected to the specified location

Joe Clarke Fri, 07/23/2010 - 10:22
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

The password is probably just admin.  However, it can be found in the NMSROOT/objects/smarts/conf/serverConnect.conf file.

rodrigopitta Fri, 07/23/2010 - 12:45
User Badges:

That is what serverConnect.conf says. But the output for the command is still the same.

Joe Clarke Fri, 07/23/2010 - 14:17
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

Post the output of the pdshow command.

rodrigopitta Mon, 07/26/2010 - 06:23
User Badges:

Hi.


Although I have installed a fresh LMSinstalation, after rediscovering all devices, the new installation of LMS is showing the same error in RME homepage: "JRM Service could be down. Check whether JRM  services are running.".


By the pdshow results, I can see JRM is still "Waiting to initialize" since last fryday. The sm_server process is no longer live and the CPU activity is low righnow. The pdshow results and jrm.log are attached.


Do you think DFM is causing this problem? My great concerns are RME and CM, so I could reduce DFM functionality, or even disable it, if necessary.

Attachment: 
Joe Clarke Mon, 07/26/2010 - 09:58
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

Jrm is taking way too long to start.  It is only allowed 30 seconds, but it looks like it's taking over four minutes.  Assuming the server is idle, you can try doing:


pdterm jrm

pdexec jrm


To see if it starts.  If it does, it could be that the server is too busy during Daemon Manager start time, and the CPU load is preventing jrm from starting correctly.  What are the specs of this server?

rodrigopitta Mon, 07/26/2010 - 13:08
User Badges:

Before your last message, I've disabled all DFM polling and SNMP trap receiving. After that, the RME home started working normally and I was able to manage jobs once again. You were right, the server coudn't handle all the traps and the polling, but even after disabling those,  the CM home still shows the JRM down message. I tried stopping and restarting the JRM, like you suggested, but the problem persisted.

The system is a Windows 2003 on a VMWare ESXi. The hardware is a dual Xeon with 4GB of dedicated RAM to the LMS server. I don't have the clock informations now, but I can provide it tomorrow.

P.S.: Is there any clean way to prevent DFM from processing SNMP traps? For test purpooses, I changed the SNMP port, but it makes Windows to generate lots of ICMP port  unreachable packets and I don't intend to leave it this way.

Joe Clarke Mon, 07/26/2010 - 22:07
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

Unfortunately, VMWare is not supported in LMS 2.6.  That could be contributing to your performance issues.  If you need to use VMWare, you will have to upgrade to LMS 3.x.  The only way to stop DFM from processing the traps is to stop sending them to DFM.  You can change your device configs to only send traps which DFM understands.  See http://www.cisco.com/en/US/partner/docs/net_mgmt/ciscoworks_device_fault_manager/2.0_IDU_2.0.6/user/guide/TrapFwd.html for the list.

Actions

This Discussion

Related Content