RME 4.3 (LMS 3.2) archive jobs hanging

Answered Question
Jul 16th, 2009

Hi,

I have an issue that, usually, the archive poll job hangs (still shows as running). This also stops all other archive jobs running until LMS is restarted. The only stacktraces are xdi related. Are all the known xdi issues fixed in RME 4.3 ?


Thanks



[ Wed Jul 15 21:09:08 BST 2009 ],ERROR,[Thread-339],com.cisco.nm.xms.xdi.transport.cmdsvc.LogAdapter,error,19,Unexpected Ssh2Exception stacktrace:

[ Wed Jul 15 21:09:08 BST 2009 ],DEBUG,[Thread-339],com.cisco.nm.xms.xdi.transport.cmdsvc.LogAdapter,printStackTrace,51,stacktracecom.cisco.nm.lib.cmdsvc.ssh2.Ssh2Exceptio

n: Disconnected from remote host

at com.cisco.nm.lib.cmdsvc.ssh2.StreamPair.readBytes(StreamPair.java:332)

at com.cisco.nm.lib.cmdsvc.ssh2.StreamPair.readPacket(StreamPair.java:183)

at com.cisco.nm.lib.cmdsvc.ssh2.Ssh2Engine.run(Ssh2Engine.java:234)


[ Wed Jul 15 21:09:08 BST 2009 ],DEBUG,[Thread-45],com.cisco.nm.xms.xdi.transport.cmdsvc.LogAdapter,printStackTrace,51,stacktracejava.net.SocketException: Broken pipe

at java.net.SocketOutputStream.socketWrite0(Native Method)

at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)

at java.net.SocketOutputStream.write(SocketOutputStream.java:136)

at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)

at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)

at com.cisco.nm.lib.cmdsvc.ssh2.StreamPair.flush(StreamPair.java:341)

at com.cisco.nm.lib.cmdsvc.ssh2.StreamPair.write(StreamPair.java:164)

at com.cisco.nm.lib.cmdsvc.ssh2.StreamPair.write(StreamPair.java:128)

at com.cisco.nm.lib.cmdsvc.ssh2.Ssh2Engine.write(Ssh2Engine.java:119)

at com.cisco.nm.lib.cmdsvc.ssh2.Ssh2Engine.disconnect(Ssh2Engine.java:375)

at com.cisco.nm.lib.cmdsvc.SSH2Session.disconnect(SSH2Session.java:180)

at com.cisco.nm.lib.cmdsvc.SSH2Session.close(SSH2Session.java:169)

at com.cisco.nm.lib.cmdsvc.OpConnect.revert(OpConnect.java:74)

at com.cisco.nm.lib.cmdsvc.SessionContext.revert(SessionContext.java:587)

at com.cisco.nm.lib.cmdsvc.SessionContext.invoke(SessionContext.java:216)

at com.cisco.nm.lib.cmdsvc.Engine.process(Engine.java:57)

at com.cisco.nm.lib.cmdsvc.LocalProxy.process(LocalProxy.java:22)

at com.cisco.nm.lib.cmdsvc.CmdSvc.close(CmdSvc.java:591)

at com.cisco.nm.xms.xdi.pkgs.LibDcma.persistor.CliOperator.cleanupOperator(CliOperator.java:1219)

at com.cisco.nm.xms.xdi.pkgs.SharedDcmaPIX.transport.PIXCliOperator.cleanupOperator(PIXCliOperator.java:844)

at com.cisco.nm.xms.xdi.pkgs.SharedDcmaPIX.transport.PIXConfigOperator.cleanupOperator(PIXConfigOperator.java:252)

at com.cisco.nm.xms.xdi.pkgs.LibDcma.persistor.OperatorCacheManager.clearCache(OperatorCacheManager.java:95)

at com.cisco.nm.xms.xdi.pkgs.SharedDcmaPIX.transport.PIXConfigOperator.operationDone(PIXConfigOperator.java:259)

at com.cisco.nm.rmeng.dcma.configmanager.ConfigManager.updateArchiveForDevice(ConfigManager.java:840)

at com.cisco.nm.rmeng.dcma.configmanager.ConfigManager.performCollection(ConfigManager.java:1646)

at com.cisco.nm.rmeng.dcma.configmanager.CfgUpdateThread.run(CfgUpdateThread.java:27)



Correct Answer by Joe Clarke about 7 years 7 months ago

There are a lot of bug IDs associated with this (e.g. 6533630). If you apply the latest Solaris recommended patch cluster, you should be okay. I'm running it on my servers, and I have not seen this hang.

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (2 ratings)
Loading.
Joe Clarke Thu, 07/16/2009 - 06:19

All of the known lock-up bugs have been fixed in RME 4.3. In order to troubleshoot this, you will need to get a full Java thread dump from the ConfigMgmtServer process. If this is on Windows, the procedure can be somewhat involved, and you should contact TAC to have them walk you through it.

jdwattsrb Thu, 07/16/2009 - 06:46

Ok Thanks. It's Solaris, but I will open a TAC case. Will post back anything informative.

Joe Clarke Thu, 07/16/2009 - 06:58

Solaris is much easier. You can send a SIGQUIT to the ConfigMgmtServer PID. The thread dump will be written to daemons.log.

Joe Clarke Fri, 07/17/2009 - 09:12

Looks like you're hitting a Solaris bug. To workaround this, edit /opt/CSCOpx/lib/jre/lib/security/java.security, and change the line:


security.provider.1=sun.security.pkcs11.SunPKCS11 ${java.home}/lib/security/sunpkcs11-solaris.cfg


to:


security.provider.1=sun.security.provider.Sun


Then restart dmgtd.

jdwattsrb Sun, 07/19/2009 - 23:53

All looks good. Do you by any chance have a Solaris patch or bug ID for this?


Any-which-way I will mark it resolved.


Thanks.


Correct Answer
Joe Clarke Mon, 07/20/2009 - 08:14

There are a lot of bug IDs associated with this (e.g. 6533630). If you apply the latest Solaris recommended patch cluster, you should be okay. I'm running it on my servers, and I have not seen this hang.

Actions

This Discussion