cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1128
Views
10
Helpful
12
Replies

DFM email alerting questions

js88888888
Level 1
Level 1

First, is DFM the best module to use in LMS 2.6 for problems?

What I'm seeing on the email notifications is the body doesn't include enough detail. Like it will end with Utilization:.... as if there is a character limit within LMS for email formatting. Is this the case? I would like full detail on any faults/alarms.

thanks

1 Accepted Solution

Accepted Solutions

I thought you had said your notification group only included switches and hubs. Double-check that the notification group config is the same for both device classes.

As for threshold adjustment, that is done under DFM > Configuration > Polling and Thresholds > Managing Thresholds. For interface utilization, select DFM > System Defined Groups > Interface Groups, and select your desired interface group. Note: if you want to modify thresholds for a customer interface group, it can be found under User Defined Groups.

View solution in original post

12 Replies 12

Joe Clarke
Cisco Employee
Cisco Employee

You mean is it the most problematic? Yeah, probably. We OEM it from EMC so we do not have full control over the source. It makes debugging it more complicated than the other components.

Yes, there is a 250 character limit. This can be adjusted in NMSROOT/objects/nos/config/nos.properties. The property is MAX_EMAIL_DES, and 1024 is the max allowed value.

Actually I meant is it the best module for alarms/alert notification but thanks for the info. Just wondering if I was missing a "clearinghouse" alerting/notification set up in one of the other modules. That is interesting.

I will adjust the limit too.

Thanks again for your help. As we are utilizing the DFM notifications/alerting, a few more questions on options and configurations are popping up.

1. DFM is alerting us realtime on the respective events configured in the Notification Group. I do have the "cleared" option checked but we aren't seeing any 'recovery' or 'cleared' alerts. You know, "this interface is down" followed by "this interface is no longer down" when the issue is resolved and all is working. Is this a logging option at the device that's not sending out the alert or an option within LMS that's not configured. I haven't restarted LMS since setting up Notification options.

Right now, we are sending SNMP linkup/linkdown traps and sending syslogging to our LMS server.

2. Can we get more detail on the alerts. For instance, HighUtlilization doesn't include the actual metric (75%, 85%, etc). Can this be included in the notifications as well?

thanks jclarke, you've been a great help in this upgrade.

Please post a screenshot of your notification group config. I'm betting you have a configuration problem.

Alerts are generally useless in terms of notifications. It's really the events that you want. The events are atomic problems on the devices where as alerts are a roll up of multiple events. The events should contain the relevant problem details.

OK, here is my config.

First, as I said before, I would uncheck the boxes under Alerts, and just stick with the event notifications. Those will be much more useful. As it stands now, you will be notified for EVERYTHING which can get very noisy.

As for why cleared events are not being sent, I can't say. This config is valid for having cleared events from matching devices sent as notifications. Do you see those cleared events in the Alerts and Activities Display?

OK, we weren't getting a huge amount of notifications but I disabled the alerts and kept the events.

When I look in DFM-Fault History:Events, there are as many Cleared events as Active (roughly), for the selected device group. In this case, Hubs and Switches.

To troubleshoot this, you will need to enable Notification Services debugging under DFM > Configuration > Other Configurations > Logging, then have a new event get cleared. Then the NMSROOT/log/dfmLogs/NOS/nos.log should have information as to why the event was not sent as a notification.

OK, this is done and I'm looking at the log file after several active then cleared events. What should I be looking for in the log?

If I can safely assume the Cleared event showing in DFM can be correllated with the same time stamp in the log, I don't see much that says error or failure.

The Event ID in the log entries doesn't match the DFM event IDs, the numbers aren't even close. The log shows a 4 digit code, DFM has 3 digits. Not sure what to look at at this point.

Here are some lines from the log that may or may not matter:

04-Mar-2008|08:59:51.479|DEBUG|NOS|Thread-6|NOSUtil|sendNotifOnStart|.|0

04-Mar-2008|08:59:21.463|DEBUG|NOS|Thread-6|EpmDBConnection|selectEventId()|.|select = SELECT MAX(id) FROM epm_condition_history

04-Mar-2008|08:59:21.463|DEBUG|NOS|Thread-6|EpmDBConnection|selectEventId()|.|&&& Printing event id 4073

04-Mar-2008|08:59:21.463|DEBUG|NOS|Thread-6|EpmDBConnection|selectAlertId()|.|select = SELECT MAX(id) FROM epm_alarm

04-Mar-2008|08:59:21.463|DEBUG|NOS|Thread-6|EpmDBConnection|selectAlertId()|.|&&& Printing event id 1399

04-Mar-2008|08:59:21.479|DEBUG|NOS|Thread-6|EPMPoller|pollEPM_Wash()|.|DONE pollEPM_Wash

04-Mar-2008|08:59:21.479|DEBUG|NOS|Thread-6|EPMPoller|EPMPoller()|.|Inside while(true)

04-Mar-2008|08:59:21.479|DEBUG|NOS|Thread-6|EPMPoller|EPMPoller()|.|sleeping = 30000

04-Mar-2008|08:59:51.479|DEBUG|NOS|Thread-6|EPMPoller|pollEPM_Wash|.|Inside pollEPM_Wash

04-Mar-2008|08:59:51.479|DEBUG|NOS|Thread-6|EPMPoller|pollEPM_Wash()|.|!! _pollStarted = true

Also, this shows up repeatedly:

EpmDBConnection|createConnection()|.|After checking max use Count : 18

04-Mar-2008|08:57:21.401|DEBUG|NOS|Thread-6|EpmDBConnection|createNewConnection()|.|Going to close connection com.cisco.nm.cmf.dbservice2.DBConnection@4c9a3d

04-Mar-2008|08:57:21.432|DEBUG|NOS|Thread-6|EpmDBConnection|createNewConnection()|.|--->New Connection Allocated com.cisco.nm.cmf.dbservice2.DBConnection@6d888e

Update, I restarted the Daemon Manager and several things started to happen.

1. The metrics are now included in the emails, so we know utilization numbers, etc.

2. The cleared events are being sent now too, HOWEVER, the status in these notifications are not showing as Cleared but still Active.

Any ideas on getting it to show as Cleared? This is for ease of use so we don't have to drill down to the text and compare numbers.

Another update: We seem to be getting Cleared emails from switches and hubs but not routers, those still show up as active even though the data indicates they should be 'cleared' (now below threshold, no reachable, etc).

Also, where in the heck are the Utilization thresholds hiding? I need to up the numbers since %40 is a bit low for alerting, etc.

In DFM, all I see are Environment, Reachability and Processor and Memory Settings. Nothing with an interface utilization parameter. Only CPU and mem.

I thought you had said your notification group only included switches and hubs. Double-check that the notification group config is the same for both device classes.

As for threshold adjustment, that is done under DFM > Configuration > Polling and Thresholds > Managing Thresholds. For interface utilization, select DFM > System Defined Groups > Interface Groups, and select your desired interface group. Note: if you want to modify thresholds for a customer interface group, it can be found under User Defined Groups.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Innovations in Cisco Full Stack Observability - A new webinar from Cisco