05-27-2009 05:59 AM
I am noticing that my LMS 3.1 snmp polling and notificatios seem to be delayed. It appears that I do not receive an alert until AFTER a device has gone off-line and then come back on-line. I believe that this is counter-intuitive but cannot find any other explanation. Do I have my LMS mis-configured?
05-27-2009 08:15 AM
Exactly what in LMS are you using for these alerts? How have you configured these alerts?
05-27-2009 10:30 AM
We are using DFM, Notification Services and E-mail notification for event set A.
The following are the events selected in event set A:
Event
Code Description Severity
1024 Device Active Critical
1022 Device Unreachable Critical
1019 OutOfRange Critical
1018 OperationallyDown Critical
1015 InsufficientFreeMemory Critical
1013 HighUtilization Critical
1012 HighQueueDropRate Critical
1011 HighErrorRate Critical
1010 HighDiscardRate Critical
1009 HighCollisionRate Critical
1008 HighBufferUtilization Critical
1007 HighBufferMissRate Critical
1006 HighBroadcastRate Critical
1005 HighBackplaneUtilization Critical
1004 Flapping Critical
1003 ExcessiveFragmentation Critical
05-27-2009 10:35 AM
The default DFM polling interval is four minutes. If you're getting both an Active and a Cleared notification for Unreachable, then DFM is detecting both events. You shouldn't need to adjust the polling. Instead, look at the SMTP server. Perhaps it is delaying the delivery of the notification emails. DFM does not delay generation of events. As soon as it sees the event, you should see an alert update in the DFM Alerts and Activities Display. The event is then handed to Notification Services. The total processing should be a few seconds at most. From there, it's up to the SMTP relay network.
05-27-2009 10:42 AM
Would that were only the case.....
The SMTP server is handing the e-mails in amatter of seconds. What I am describing is COMPLETELY different. We tried the defaults. Don't get me wrong, I catuall do LOVE SPAM. Just not CW Spam. Fals positives caused us to set the polling and thresholds to an interval of 360 seconds and a timeout of 30000 msec with a retry value of 10, on both our Routers and switches.
We then went to a local switch, unplugged it's uplink and started the stopwatch. After 7 minutes, we plugged it back in, and received an e-mail from CW within 2 minutes that the device was unreachable. We never received a notification that the device was active.
Are my expectations not set correctly?
05-27-2009 11:32 AM
At what point did the alert update in the Alerts and Activities Display? If that is being delayed, then real-time analysis will need to be performed by TAC with a sniffer trace, and with debugging enabled on the DFM servers.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: