Thresholds on the DFM

Unanswered Question
Sep 24th, 2008
User Badges:

I setup threshold of processor and cpu in the DFM. For example, I changed "Processor utilization threshold" as 60%. LMS would send alart email if CPU of core switch is beyond it.But i have not received any alert for a long time,so I log on the core switch and type "show process cpu history". In " CPU% per minute (last 60 minutes)", I could find that beyond threshold was happened in this chart (maximum CPU% is beyond 60 %,but average CPU% is not).I want to know why I did not receive any alert.

BTW, I tried to decrease threshold as 1% only for test purpose. Lots of alert could come in.

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Joe Clarke Wed, 09/24/2008 - 13:38
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

If decreasing the threshold to 1% works, then DFM must not have seen the 60% CPU spike (either the due to the polling interval or a problem on the device).


It would be a good idea to repeat the test with a sniffer running on the DFM server filtering on SNMP traffic to the device. When the CPU spike occurs, wait for one more polling interval to pass (by default this is every four minutes), then check the sniffer trace to see what the device is advertising in terms of CPU utilization.

HWangLoyalty_2 Thu, 09/25/2008 - 11:56
User Badges:

Thanks for your reply.

As you said, maybe the issue is from polling interval.The default settings is 4m.If one polling interval just passed 2m,DFM found a 60% CPU spike. I want to know if I could receive alert right now or I have to wait for 2m.

BTW,if this spike only lasted 15s, could not I receive any alert anymore?

Thanks for your help!

Joe Clarke Thu, 09/25/2008 - 12:57
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

Since DFM is poll-driven, i must be able to see the event via polling. If a four minute polling interval is too long, you can drop this. But be aware that setting it too low could have a CPU impact.


With CPU utilization, if the spike only lasted for 15 seconds, then DFM might very well not catch it. Perhaps you should consider using something like the Embedded Resource Manager, or a general CPU threshold, and trigger syslog messages which RME can then process.

Actions

This Discussion