DFM email alerting questions

Answered Question
Feb 29th, 2008

First, is DFM the best module to use in LMS 2.6 for problems?

What I'm seeing on the email notifications is the body doesn't include enough detail. Like it will end with Utilization:.... as if there is a character limit within LMS for email formatting. Is this the case? I would like full detail on any faults/alarms.

thanks

Correct Answer by Joe Clarke about 8 years 11 months ago

I thought you had said your notification group only included switches and hubs. Double-check that the notification group config is the same for both device classes.

As for threshold adjustment, that is done under DFM > Configuration > Polling and Thresholds > Managing Thresholds. For interface utilization, select DFM > System Defined Groups > Interface Groups, and select your desired interface group. Note: if you want to modify thresholds for a customer interface group, it can be found under User Defined Groups.

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (3 ratings)
Loading.
Joe Clarke Fri, 02/29/2008 - 11:47

You mean is it the most problematic? Yeah, probably. We OEM it from EMC so we do not have full control over the source. It makes debugging it more complicated than the other components.

Yes, there is a 250 character limit. This can be adjusted in NMSROOT/objects/nos/config/nos.properties. The property is MAX_EMAIL_DES, and 1024 is the max allowed value.

js88888888 Fri, 02/29/2008 - 12:21

Actually I meant is it the best module for alarms/alert notification but thanks for the info. Just wondering if I was missing a "clearinghouse" alerting/notification set up in one of the other modules. That is interesting.

I will adjust the limit too.

js88888888 Mon, 03/03/2008 - 08:46

Thanks again for your help. As we are utilizing the DFM notifications/alerting, a few more questions on options and configurations are popping up.

1. DFM is alerting us realtime on the respective events configured in the Notification Group. I do have the "cleared" option checked but we aren't seeing any 'recovery' or 'cleared' alerts. You know, "this interface is down" followed by "this interface is no longer down" when the issue is resolved and all is working. Is this a logging option at the device that's not sending out the alert or an option within LMS that's not configured. I haven't restarted LMS since setting up Notification options.

Right now, we are sending SNMP linkup/linkdown traps and sending syslogging to our LMS server.

2. Can we get more detail on the alerts. For instance, HighUtlilization doesn't include the actual metric (75%, 85%, etc). Can this be included in the notifications as well?

thanks jclarke, you've been a great help in this upgrade.

Joe Clarke Mon, 03/03/2008 - 10:34

Please post a screenshot of your notification group config. I'm betting you have a configuration problem.

Alerts are generally useless in terms of notifications. It's really the events that you want. The events are atomic problems on the devices where as alerts are a roll up of multiple events. The events should contain the relevant problem details.

Joe Clarke Mon, 03/03/2008 - 13:47

First, as I said before, I would uncheck the boxes under Alerts, and just stick with the event notifications. Those will be much more useful. As it stands now, you will be notified for EVERYTHING which can get very noisy.

As for why cleared events are not being sent, I can't say. This config is valid for having cleared events from matching devices sent as notifications. Do you see those cleared events in the Alerts and Activities Display?

js88888888 Mon, 03/03/2008 - 14:26

OK, we weren't getting a huge amount of notifications but I disabled the alerts and kept the events.

When I look in DFM-Fault History:Events, there are as many Cleared events as Active (roughly), for the selected device group. In this case, Hubs and Switches.

Joe Clarke Mon, 03/03/2008 - 15:00

To troubleshoot this, you will need to enable Notification Services debugging under DFM > Configuration > Other Configurations > Logging, then have a new event get cleared. Then the NMSROOT/log/dfmLogs/NOS/nos.log should have information as to why the event was not sent as a notification.

js88888888 Tue, 03/04/2008 - 08:23

OK, this is done and I'm looking at the log file after several active then cleared events. What should I be looking for in the log?

If I can safely assume the Cleared event showing in DFM can be correllated with the same time stamp in the log, I don't see much that says error or failure.

The Event ID in the log entries doesn't match the DFM event IDs, the numbers aren't even close. The log shows a 4 digit code, DFM has 3 digits. Not sure what to look at at this point.

Here are some lines from the log that may or may not matter:

04-Mar-2008|08:59:51.479|DEBUG|NOS|Thread-6|NOSUtil|sendNotifOnStart|.|0

04-Mar-2008|08:59:21.463|DEBUG|NOS|Thread-6|EpmDBConnection|selectEventId()|.|select = SELECT MAX(id) FROM epm_condition_history

04-Mar-2008|08:59:21.463|DEBUG|NOS|Thread-6|EpmDBConnection|selectEventId()|.|&&& Printing event id 4073

04-Mar-2008|08:59:21.463|DEBUG|NOS|Thread-6|EpmDBConnection|selectAlertId()|.|select = SELECT MAX(id) FROM epm_alarm

04-Mar-2008|08:59:21.463|DEBUG|NOS|Thread-6|EpmDBConnection|selectAlertId()|.|&&& Printing event id 1399

04-Mar-2008|08:59:21.479|DEBUG|NOS|Thread-6|EPMPoller|pollEPM_Wash()|.|DONE pollEPM_Wash

04-Mar-2008|08:59:21.479|DEBUG|NOS|Thread-6|EPMPoller|EPMPoller()|.|Inside while(true)

04-Mar-2008|08:59:21.479|DEBUG|NOS|Thread-6|EPMPoller|EPMPoller()|.|sleeping = 30000

04-Mar-2008|08:59:51.479|DEBUG|NOS|Thread-6|EPMPoller|pollEPM_Wash|.|Inside pollEPM_Wash

04-Mar-2008|08:59:51.479|DEBUG|NOS|Thread-6|EPMPoller|pollEPM_Wash()|.|!! _pollStarted = true

Also, this shows up repeatedly:

EpmDBConnection|createConnection()|.|After checking max use Count : 18

04-Mar-2008|08:57:21.401|DEBUG|NOS|Thread-6|EpmDBConnection|createNewConnection()|.|Going to close connection com.cisco.nm.cmf.dbservice2.DBConnection@4c9a3d

04-Mar-2008|08:57:21.432|DEBUG|NOS|Thread-6|EpmDBConnection|createNewConnection()|.|--->New Connection Allocated com.cisco.nm.cmf.dbservice2.DBConnection@6d888e

js88888888 Tue, 03/04/2008 - 09:49

Update, I restarted the Daemon Manager and several things started to happen.

1. The metrics are now included in the emails, so we know utilization numbers, etc.

2. The cleared events are being sent now too, HOWEVER, the status in these notifications are not showing as Cleared but still Active.

Any ideas on getting it to show as Cleared? This is for ease of use so we don't have to drill down to the text and compare numbers.

js88888888 Tue, 03/04/2008 - 11:18

Another update: We seem to be getting Cleared emails from switches and hubs but not routers, those still show up as active even though the data indicates they should be 'cleared' (now below threshold, no reachable, etc).

Also, where in the heck are the Utilization thresholds hiding? I need to up the numbers since %40 is a bit low for alerting, etc.

In DFM, all I see are Environment, Reachability and Processor and Memory Settings. Nothing with an interface utilization parameter. Only CPU and mem.

Correct Answer
Joe Clarke Tue, 03/04/2008 - 11:29

I thought you had said your notification group only included switches and hubs. Double-check that the notification group config is the same for both device classes.

As for threshold adjustment, that is done under DFM > Configuration > Polling and Thresholds > Managing Thresholds. For interface utilization, select DFM > System Defined Groups > Interface Groups, and select your desired interface group. Note: if you want to modify thresholds for a customer interface group, it can be found under User Defined Groups.

Actions

This Discussion