I've just been handed an early-stage project to implement Tidal at our company. We're a largely Unix shop, and utilize a lot of open source projects. I can't seem to find much information in the TES documentation that provides a starting place for monitoring (we'll be using Nagios/Icinga). What are others out there doing to monitor their Tidal infrastructures?
I've used Nimbus and SolarWinds to monitor performance, logs and errors for the Tidal infrastructure itself. You can have Tidal send SNMP traps to a monitoring software for job and connection failures. I believe Tidal can also integrate with HP's Operations Center software.
So at this point, I'm monitoring the status of each agent's connection to the master cia the REST API, and checking some TCP ports on the CM and master. Does anyone happen to have specifics of what they monitor for performance, logs, etc and how they do it?
TES can also integrate into the other former Tidal product now called Cisco Process Orchestrator. If you go out and watch the blogs called "One interface to rule them all" I create a monitoring process in CPO to monitor the up/down of agents and restarted them and other things via the TES 6.X web services.
CIAC Adoption Pilot Engineering Lead
We have an enterprise monitoring product which does basic monitoring and automatic maintenance eg process checks/restarts, server load and so on. Its also extensible such that we can add custom scripts for finer granularity eg job queue limts etc. I'm not endorsing the product we use per-se but any similar tool in this space ought to be able to do the same job. We're talking HP Openview, Tivoli, BMC, Uptime to name but a few.