Re: Question relating to LMS 3.1 Polling

rnieuwhof · ‎05-27-2009

I am noticing that my LMS 3.1 snmp polling and notificatios seem to be delayed. It appears that I do not receive an alert until AFTER a device has gone off-line and then come back on-line. I believe that this is counter-intuitive but cannot find any other explanation. Do I have my LMS mis-configured?

Joe Clarke · ‎05-27-2009

Exactly what in LMS are you using for these alerts? How have you configured these alerts?

rnieuwhof · ‎05-27-2009

We are using DFM, Notification Services and E-mail notification for event set A.

The following are the events selected in event set A:

Event

Code Description Severity

1024 Device Active Critical

1022 Device Unreachable Critical

1019 OutOfRange Critical

1018 OperationallyDown Critical

1015 InsufficientFreeMemory Critical

1013 HighUtilization Critical

1012 HighQueueDropRate Critical

1011 HighErrorRate Critical

1010 HighDiscardRate Critical

1009 HighCollisionRate Critical

1008 HighBufferUtilization Critical

1007 HighBufferMissRate Critical

1006 HighBroadcastRate Critical

1005 HighBackplaneUtilization Critical

1004 Flapping Critical

1003 ExcessiveFragmentation Critical

Joe Clarke · ‎05-27-2009

The default DFM polling interval is four minutes. If you're getting both an Active and a Cleared notification for Unreachable, then DFM is detecting both events. You shouldn't need to adjust the polling. Instead, look at the SMTP server. Perhaps it is delaying the delivery of the notification emails. DFM does not delay generation of events. As soon as it sees the event, you should see an alert update in the DFM Alerts and Activities Display. The event is then handed to Notification Services. The total processing should be a few seconds at most. From there, it's up to the SMTP relay network.

rnieuwhof · ‎05-27-2009

Would that were only the case.....

The SMTP server is handing the e-mails in amatter of seconds. What I am describing is COMPLETELY different. We tried the defaults. Don't get me wrong, I catuall do LOVE SPAM. Just not CW Spam. Fals positives caused us to set the polling and thresholds to an interval of 360 seconds and a timeout of 30000 msec with a retry value of 10, on both our Routers and switches.

We then went to a local switch, unplugged it's uplink and started the stopwatch. After 7 minutes, we plugged it back in, and received an e-mail from CW within 2 minutes that the device was unreachable. We never received a notification that the device was active.

Are my expectations not set correctly?

Joe Clarke · ‎05-27-2009

At what point did the alert update in the Alerts and Activities Display? If that is being delayed, then real-time analysis will need to be performed by TAC with a sniffer trace, and with debugging enabled on the DFM servers.