LMS2.6 - DFM alarm notification get's delayed

Answered Question
Feb 27th, 2007

Hi to all,

We have DFM configured to send trap notifications towards an external machine, and we see that sometimes the alarms that arrives to the external NMS are delayed. some o them are sent only days after it were received in CW .

Any clue ??... correlation issues maybe ?

Cheers,

Pedro

I have this problem too.
0 votes
Correct Answer by Joe Clarke about 9 years 5 months ago

The attachment didn't come through, but I'm not sure how useful it would have been. It sounds like he sent a screenshot of a sniffer trace which isn't very helpful. We would need to see the binary sniffer trace file. In any event, I think I found an example in the log, and it looks like address resolution might be the culprit.

Basically, DFM takes the IP address that is configured and performs a lookup on the address. So, DFM tries to use the system resolver to resolve 192.168.168.134. This takes about three seconds to complete. The way Windows address resolution works can cause excessive delays when performing these kind of operations. As a workaround, try adding this address to DNS or your server's hosts file, and see if this problem improves.

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (1 ratings)
Loading.
Joe Clarke Tue, 02/27/2007 - 13:35

I have not heard of any delay problems with sending trap notifications. However, there are a few problems where notifications are lost. Are you sure the notifications showing up on the external NMS are actually delayed, or is it a new notification that is showing up? How have you configured your trap notification subscription, and what do one of these delayed notifications look like?

ptrigueira Tue, 02/27/2007 - 14:12

I'm sure it's the same notification cause the id is the same ...i compare them, i mean, the alarm that appears on DFM and the alarm that appears later on external nms...

P

Joe Clarke Tue, 02/27/2007 - 21:21

What about the answers to my other two questions?

Something you should try to debug this is to place a sniffer on the LMS server, and filter on UDP 162 traffic to your external NMS. See if the traps are leaving the LMS server in a timely fashion. If not, you can enable Notification Services debugging under DFM > Configuration > Other Configurations > Logging, reproduce this problem, then check the NMSROOT/log/dfmLogs/NOS/nos.log for errors.

pvanvuuren Tue, 02/27/2007 - 23:50

My DFM (lms2.6; dmf 2.0.8) email notification doesn't work automatically. If i click on notify and fill in the details , the email comes through. But if i manually generate an alarm , the NOS.log file says that it sees it, and its preparing to send but then it doesn't.

28-Feb-2007|09:39:52.453|DEBUG|NOS|Email Notify Thread:Pooled Thread:1|EmailUtility|createAndSendMail()|.|Ready to send

28-Feb-2007|09:39:52.453|ERROR|NOS|Email Notify Thread:Pooled Thread:1|EmailUtility|createAndSendMail()|.|

Mail was not sent to ; [email protected]

frankzehrer Wed, 02/28/2007 - 05:51

Hi Pierre,

Is the NOS Server running? Check this by searching for "Common Services -> Server -> Admin -> Processes" the NOSServer process. Is it running?

With DFM 2.0.4 i had an issue with the NOSServer registration.

Navigate to "Common Services -> Server -> Admin -> Processes"

Click on the entry "NOSServer"

The following INformation should appear for 2.0.8:

Process: NOSServer

Path: /opt/CSCOpx/bin/cwjava

Flags: -server -cp:p MDC\tomcat\webapps\triveni\WEB-INF\lib\log4j.jar,

MDC\tomcat\webapps\triveni\WEB-INF\lib\nos.jar,

MDC\tomcat\webapps\triveni\WEB-INF\lib\ctm.jar,

MDC\tomcat\webapps\triveni\WEB-INF\lib\ogs-client.jar,

MDC\tomcat\webapps\triveni\WEB-INF\lib\cogs.jar,

MDC\tomcat\webapps\triveni\WEB-INF\lib\ogs-kc1.0.jar,

MDC\tomcat\webapps\triveni\WEB-INF\lib\ogs-kilnervirtualasa1.0.jar,

MDC\tomcat\webapps\triveni\WEB-INF\lib\activation.jar,

MDC\tomcat\webapps\triveni\WEB-INF\lib\mail.jar,

MDC\tomcat\webapps\triveni\WEB-INF\lib\tis-client.jar,

MDC\tomcat\webapps\triveni\WEB-INF\lib\pm.jar,

lib\classpath\jconn2.jar,lib\classpath\epm-v0-5-0.jar,

lib\classpath\cesar-v0-5-0b.jar,

MDC\tomcat\shared\lib\NATIVE.jar,

MDC\tomcat\shared\lib\MICE.jar,

MDC\bin,objects\nos\config,MDC\tomcat\webapps\triveni\WEB-INF\classes,

-cw:jre D:\PROGRA~1\CSCOpx\MDC\jre -cw:xrs -Djava.compiler=NONE com.cisco.nm.trx.nos.server.NOSServer

Startup: Started automatically at boot.

Dependencies: EPMDbEngine EPMServer INVDbEngine PMServer DFMOGSServer

If there is a difference you might try tu unregister and register NOSServer.

Windows

@ECHO OFF

SET NMSROOT=c:\CSCOpx

CALL %NMSROOT%\bin\pdreg.cmd -u NOSServer

CALL %NMSROOT%\bin\pdreg.cmd -r NOSServer -d "EPMDbEngine,EPMServer,INVDbEngine,PMServer,DFMOGSServer" -e %NMSROOT%\bin\cwjava.exe -f "-server^-cp:p^MDC\tomcat\webapps\triveni\WEB-INF\lib\log4j.jar,MDC\tomcat\webapps\triveni\WEB-INF\lib\nos.jar,MDC\tomcat\webapps\triveni\WEB-INF\lib\ctm.jar,MDC\tomcat\webapps\triveni\WEB-INF\lib\ogs-client.jar,MDC\tomcat\webapps\triveni\WEB-INF\lib\cogs.jar,MDC\tomcat\webapps\triveni\WEB-INF\lib\ogs-kc1.0.jar,MDC\tomcat\webapps\triveni\WEB-INF\lib\ogs-kilnervirtualasa1.0.jar,MDC\tomcat\webapps\triveni\WEB-INF\lib\activation.jar,MDC\tomcat\webapps\triveni\WEB-INF\lib\mail.jar,MDC\tomcat\webapps\triveni\WEB-INF\lib\tis-client.jar,MDC\tomcat\webapps\triveni\WEB-INF\lib\pm.jar,lib\classpath\jconn2.jar,lib\classpath\epm-v0-5-0.jar,lib\classpath\cesar-v0-5-0b.jar,MDC\tomcat\shared\lib\NATIVE.jar,MDC\tomcat\shared\lib\MICE.jar,MDC\bin,objects\nos\config,MDC\tomcat\webapps\triveni\WEB-INF\classes,^-cw:jre D:\PROGRA~1\CSCOpx\MDC\jre^-cw:xrs^-Djava.compiler=NONE com.cisco.nm.trx.nos.server.NOSServer"

Hope that helps a bit. Ohterwise help us an provide a bit more details about your setup (Notification Groups and settings).

Best regards,

Frank

Joe Clarke Wed, 02/28/2007 - 09:05

Stick a sniffer on the LMS server, and filter for TCP port 25 to your DFM SMTP server. That trace may provide a reason why the mail is not being sent automatically.

ptrigueira Mon, 04/09/2007 - 07:20

Hi JC,

Can you hepl me out here ... What "function module" should y raise to debug level for the delay problem we are having with notification been delayed.

partial nos.log attached...

Attachment: 
Joe Clarke Mon, 04/09/2007 - 08:22

I don't see any problems with delay on the DFM side. I see an alert come in at 10:08:34, and goes out as a trap at 10:08:50. As I said previously, you should start deploying sniffers to watch the trap move from the DFM server to the external NMS.

ptrigueira Mon, 04/09/2007 - 08:32

Hi JC,

you're right, and we aredoing that. The problem only occurs when we he have instability in the network where a bunch of alarms are sent to the CW.

The attached NOS.log is not in debug but you can see the ammount of messages in the same second ...

Attachment: 
Joe Clarke Mon, 04/09/2007 - 08:37

There is bug retrieving event details that will be worked out in LMS 3.0, but it shouldn't delay the sending of a notification trap. Without a debug showing the problem, or a sniffer trace showing where in the network the delay is occurring, I cannot offer any other suggestions.

ptrigueira Thu, 04/12/2007 - 08:33

Hi JC,

About the delay, every time the system is NOT loaded we see a regular flow of traps being sent from CW to the notification server ( i.e. several packets on the same second), BUT when high load is seen, the flow of traps towards NServer lowers to one trap every 4 seconds...

Joe Clarke Thu, 04/12/2007 - 08:36

How loaded is loaded? Four seconds could be expected under high load situations. What are the specs on this server?

ptrigueira Thu, 04/12/2007 - 09:15

Hi JC,

Loaded means thousands of traps recevived on a certain period of time. The Server is a DL360 and the proccessor load doesn't goes more than 40%. I do believe that it is queueing delay, as that if i receive let's say 10 traps one the same second and i'm sending notifications for those traps one every 4 second's ...with time i will have delay.

I can say that at the moment, when we see load on the DFM, we experience more than 40 minutes delay from a alarm apearing on DFM and being sent to notification server...

P

ptrigueira Thu, 04/12/2007 - 09:20

Hi JC,

Loaded means thousands of traps recevived on a certain period of time. The Server is a DL360 and the proccessor load doesn't goes more than 40%. I do believe that it is queueing delay, as that if i receive let's say 10 traps one the same second and i'm sending notifications for those traps one every 4 second's ...with time i will have delay.

I can say that at the moment, when we see load on the DFM, we experience more than 40 minutes delay from a alarm apearing on DFM and being sent to notification server...

P

Joe Clarke Thu, 04/12/2007 - 09:39

I would still need to see a nos.log with debug showing an example of this delay.

ptrigueira Thu, 04/12/2007 - 09:51

HI JC,

Here you have. Yesterday 12/04/2007 we have delay issues during all afternoon because i was applying the "unmanageip.pl" script on two class C subnets that created outbound queueing (the "unresponsive" clear alarms ..)

Attachment: 
Joe Clarke Thu, 04/12/2007 - 11:48

There are a lot of events here. Please provide an example of one trap that was delayed so I can focus on that. I'm going through the first log, and so far I don't see any delays.

ptrigueira Thu, 04/12/2007 - 12:24

Hi JC,

Please check you internal mailing list for DFM, you will see a post from Tarpley Adams, he's here with us at the costumer site, and he has better example...

Tks

P

Correct Answer
Joe Clarke Thu, 04/12/2007 - 12:39

The attachment didn't come through, but I'm not sure how useful it would have been. It sounds like he sent a screenshot of a sniffer trace which isn't very helpful. We would need to see the binary sniffer trace file. In any event, I think I found an example in the log, and it looks like address resolution might be the culprit.

Basically, DFM takes the IP address that is configured and performs a lookup on the address. So, DFM tries to use the system resolver to resolve 192.168.168.134. This takes about three seconds to complete. The way Windows address resolution works can cause excessive delays when performing these kind of operations. As a workaround, try adding this address to DNS or your server's hosts file, and see if this problem improves.

ptrigueira Thu, 04/12/2007 - 13:48

Hi JC,

It worked, now i see several traps being sento to notification server in the same second ...tks for your precious assistance.

Cheers,

Pedro

Actions

This Discussion