02-25-2012 07:19 PM - edited 03-07-2019 05:11 AM
We have a gremlin in the system where users often can't log into the windows domain around 4 AM. The problem clears up around 5 AM. I'd like to setup some sort monitor on a couple routers to validate connectivity our netapp and our dns servers overnight. How can I configure an IP SLA or something to test connectivity every 10 seconds or so, then issue a syslog message with if the test fails? Then during the day we can analyse exactly the resutls..
Hopefully the results of this monitoring would give us a clue as to what is going on.
02-25-2012 11:47 PM
Hi, you could write an EEM applet that emails you the results of the tests it performs.
See below this link for an example
https://learningnetwork.cisco.com/blogs/network-sheriff/2009/06/19/writing-your-first-eem-applet
02-27-2012 02:15 AM
Hi
First of all i would enable logging on all systems and check what that tells me.
if your users have problems with logging on to the windows machines, what does their log tell you ? and what does the windows servers log tell you ?
Then i would check link saturation, if it might be some backup that is running at that time and saturates a link or something.
Thats what i would start with.
Good luck
HTH
02-27-2012 11:12 AM
Tod -
For this particular instance, i'd really look outside of infrastructure. While the error may indeed be caused by a lack of infrastructure, it appears that you have machines/applications/alliances running some tasks overnight. I'd look for a comprehensive network monitoring program (SolarWinds in my fave, but it's super expensive). You need to monitor [not only] your core/distribution layer, but your to/from traffic and see what is causing this and why. Use SNMP and NetFlow, and you'll gain a invaluable insight into what's going on overnight.
From personal experience, I have run into the same problems where SAN-SAN replication runs at the same time as snapshots/backups and it [literally] brings the entire network to a crawl. Simply identifying what is happening across the entire network can help you make subtle scheduling changes that could eliminate these issues.
Thanks,
Sean Brown
03-04-2012 10:39 AM
Thanks for the input. I now have What's Up Gold set to showman alert every time any of our 8 switches has an interface that averages >50% utilization for 10 minutes. Now we wait and see. On the first 24 hrs the only alarm was a WAN interface hit 70% at 3pm...definitely not my gremlin.
03-04-2012 05:42 PM
First place to check would really be the logs of your domain-controllers...
They must be doing something around that time, or at least they must also be noticing they have some issues themselves. As mentioned by others it might be something within the storage environment that (depending on your environment) may not even be dependant on your network.
Sounds like one of those "the 'network' is not working" issues that Windows admins love to toss our way...
;-)
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: