LMS Incorrectly Reporting Devices Down

Unanswered Question
Nov 15th, 2007

Hi all,

We've got something weird going on here at the moment. We use LMS 2.6 for network management/monitoring. In the last couple of days it has started reporting devices as unreachable when they're not. The status usually changes back to reachable within a couple of minutes, but may go unreachable again at any time during the day.

Checking 'show version' proves the uptime of these devices is weeks, not minutes. Our other monitoring program, Statseeker, is also showing the same symptoms. Statseeker uses ICMP ping to determine a devices reachability status, and contains an inbuilt analyzer to record traffic that hits the Statseeker server. It is NOT receiving ICMP redirects, so I've ruled that out as a source of the problem.

The bandwidth of the links to the devices being reported as down is NOT being heavily taxed, certainly no more than usual - and there has been no major change to our network that could account for this behaviour.

If anyone has any ideas I'd be really grateful.

Cheers,

Ben.

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Joe Clarke Thu, 11/15/2007 - 16:36

A device may not be going down for DFM to report is as down. DFM, like your Statseeker, also uses ICMP as part of its monitoring. If an interface is flapping or resetting, that could account for dropped ICMP packets. Even though a change might not have been major, something could have happened to the network that could be throttling or dropping ICMP.

The best way to analyze this would be first look at the device's management interface to make sure it is not transitioning state. If it isn't, then you should use a sniffer to backtrack through the network looking for where the ICMP requests or replies are being dropped.

Actions

This Discussion