I monitor the interfaces on the switches. When an interface goes down, LMS correctly reports the fault and it appears on the Monitor>Monitoring Tools>Fault Monitor screen. However it will then seemingly randomly clear these faults (usually days later) even though the fault is still there. If I go to the Fault Device Details screen, it shows Admin Status UP and Operational Status DOWN.
By changing the Managed State on that interface to false, applying the changes, changing it back to true and re-applying, the fault is picked back up again.
The exact same problem happens with failed power supplies (in this case the Fault Device Details screen shows the status as CRITICAL even after LMS has 'cleared' the fault from the Fault Monitor screen).
The switches in question are all 3750X’s.
Has anyone ever come across this problem and is so is there a fix?
Ideally the interfaces once displayed as an alarm, it will unless clear may take a backseat and if recovered will show as alarm clear and reoccur once the issue is back.
Making an interface unmanage and remanage may bring the alert back. Alerts are polled every 30 seconds, and the data is refreshed if a change has occurred. The information is always updated every 6 minutes, regardless of whether changes are detected.
But for an alert which has a single occurence may go to old aleerts and still viewable via history report.
All alerts in Fault Manager are kept for 31 days and purged from there. I think I am not clear with what exactly you're facing as a problem.
Is it the alert never comes back when it is still present on device?
-Thanks Vinod **Rating Encourages contributors, and its really free. **
**Rating Encourages contributors, and its really free. **
No-one else is clearing the faults as I am the only person that uses LMS.
The faults still exist on the switches, its just that LMS clears the alarm from the 'Monitor> Monitoring Tools> Fault Monitor' screen.
Switches exhibiting the problem are WS-C3750X-24P-S and WS-C3750X-48PF-S.
I have attached a screenshot which shows that LMS recognises the port as DOWN even though it has cleared from the faults screen. If I change the ports managed state to False, Apply changes, and then change back to True and re-apply changes it re-appears on the faults screen and sends an email alert.
Have ran a Device Fault History report as you suggested for a specific switch since 24th Feb - I see the Active alerts from when I forced it on 24th Feb, and there is no Cleared entry - the next entry is an Active once from when I re-forced the alert this morning as it had disappeared from the alerts screen.
IOS on this specific switch is c3750e-universalk9-mz.150-1.SE3.bin
Question We run asr9001 with XR 6.1.3, and we have a very long delay to
login w/ SSH 1 or 2 to the device compare to IOS device. After
investigation, the there is 1s delay between the client KEXDH_INIT and
the server (XR) KEXDH_REPLY. After debug ssh serv...
Introduction The purpose of this document is to demonstrate the Open
Shortest Path First (OSPF) behavior when the V-bit (Virtual-link bit) is
present in a non-backbone area. The V-bit is signaled in Type-1 LSA only
if the router is the endpoint of one or ...
Hi, I am seeing quite a few issues with patch install and wanted to
share my experience and workaround to this. Login to admin via CLI, then
access root with the “shell” command Issue “df –h” and you’ll probably
see the following directory full or nearly ...