cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
933
Views
5
Helpful
8
Replies

RME 4.3 (LMS 3.2) archive jobs hanging

jdwattsrb
Level 1
Level 1

Hi,

I have an issue that, usually, the archive poll job hangs (still shows as running). This also stops all other archive jobs running until LMS is restarted. The only stacktraces are xdi related. Are all the known xdi issues fixed in RME 4.3 ?

Thanks

[ Wed Jul 15 21:09:08 BST 2009 ],ERROR,[Thread-339],com.cisco.nm.xms.xdi.transport.cmdsvc.LogAdapter,error,19,Unexpected Ssh2Exception stacktrace:

[ Wed Jul 15 21:09:08 BST 2009 ],DEBUG,[Thread-339],com.cisco.nm.xms.xdi.transport.cmdsvc.LogAdapter,printStackTrace,51,stacktracecom.cisco.nm.lib.cmdsvc.ssh2.Ssh2Exceptio

n: Disconnected from remote host

at com.cisco.nm.lib.cmdsvc.ssh2.StreamPair.readBytes(StreamPair.java:332)

at com.cisco.nm.lib.cmdsvc.ssh2.StreamPair.readPacket(StreamPair.java:183)

at com.cisco.nm.lib.cmdsvc.ssh2.Ssh2Engine.run(Ssh2Engine.java:234)

[ Wed Jul 15 21:09:08 BST 2009 ],DEBUG,[Thread-45],com.cisco.nm.xms.xdi.transport.cmdsvc.LogAdapter,printStackTrace,51,stacktracejava.net.SocketException: Broken pipe

at java.net.SocketOutputStream.socketWrite0(Native Method)

at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)

at java.net.SocketOutputStream.write(SocketOutputStream.java:136)

at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)

at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)

at com.cisco.nm.lib.cmdsvc.ssh2.StreamPair.flush(StreamPair.java:341)

at com.cisco.nm.lib.cmdsvc.ssh2.StreamPair.write(StreamPair.java:164)

at com.cisco.nm.lib.cmdsvc.ssh2.StreamPair.write(StreamPair.java:128)

at com.cisco.nm.lib.cmdsvc.ssh2.Ssh2Engine.write(Ssh2Engine.java:119)

at com.cisco.nm.lib.cmdsvc.ssh2.Ssh2Engine.disconnect(Ssh2Engine.java:375)

at com.cisco.nm.lib.cmdsvc.SSH2Session.disconnect(SSH2Session.java:180)

at com.cisco.nm.lib.cmdsvc.SSH2Session.close(SSH2Session.java:169)

at com.cisco.nm.lib.cmdsvc.OpConnect.revert(OpConnect.java:74)

at com.cisco.nm.lib.cmdsvc.SessionContext.revert(SessionContext.java:587)

at com.cisco.nm.lib.cmdsvc.SessionContext.invoke(SessionContext.java:216)

at com.cisco.nm.lib.cmdsvc.Engine.process(Engine.java:57)

at com.cisco.nm.lib.cmdsvc.LocalProxy.process(LocalProxy.java:22)

at com.cisco.nm.lib.cmdsvc.CmdSvc.close(CmdSvc.java:591)

at com.cisco.nm.xms.xdi.pkgs.LibDcma.persistor.CliOperator.cleanupOperator(CliOperator.java:1219)

at com.cisco.nm.xms.xdi.pkgs.SharedDcmaPIX.transport.PIXCliOperator.cleanupOperator(PIXCliOperator.java:844)

at com.cisco.nm.xms.xdi.pkgs.SharedDcmaPIX.transport.PIXConfigOperator.cleanupOperator(PIXConfigOperator.java:252)

at com.cisco.nm.xms.xdi.pkgs.LibDcma.persistor.OperatorCacheManager.clearCache(OperatorCacheManager.java:95)

at com.cisco.nm.xms.xdi.pkgs.SharedDcmaPIX.transport.PIXConfigOperator.operationDone(PIXConfigOperator.java:259)

at com.cisco.nm.rmeng.dcma.configmanager.ConfigManager.updateArchiveForDevice(ConfigManager.java:840)

at com.cisco.nm.rmeng.dcma.configmanager.ConfigManager.performCollection(ConfigManager.java:1646)

at com.cisco.nm.rmeng.dcma.configmanager.CfgUpdateThread.run(CfgUpdateThread.java:27)

1 Accepted Solution

Accepted Solutions

There are a lot of bug IDs associated with this (e.g. 6533630). If you apply the latest Solaris recommended patch cluster, you should be okay. I'm running it on my servers, and I have not seen this hang.

View solution in original post

8 Replies 8

Joe Clarke
Cisco Employee
Cisco Employee

All of the known lock-up bugs have been fixed in RME 4.3. In order to troubleshoot this, you will need to get a full Java thread dump from the ConfigMgmtServer process. If this is on Windows, the procedure can be somewhat involved, and you should contact TAC to have them walk you through it.

Ok Thanks. It's Solaris, but I will open a TAC case. Will post back anything informative.

Solaris is much easier. You can send a SIGQUIT to the ConfigMgmtServer PID. The thread dump will be written to daemons.log.

As attached. Guess it's all those ssh2 locked threads.

Looks like you're hitting a Solaris bug. To workaround this, edit /opt/CSCOpx/lib/jre/lib/security/java.security, and change the line:

security.provider.1=sun.security.pkcs11.SunPKCS11 ${java.home}/lib/security/sunpkcs11-solaris.cfg

to:

security.provider.1=sun.security.provider.Sun

Then restart dmgtd.

Thanks, I've made the change. Should know by Monday.

All looks good. Do you by any chance have a Solaris patch or bug ID for this?

Any-which-way I will mark it resolved.

Thanks.

There are a lot of bug IDs associated with this (e.g. 6533630). If you apply the latest Solaris recommended patch cluster, you should be okay. I'm running it on my servers, and I have not seen this hang.