We want to setup alerts/monitoring to make sure that the Workplace portal is up and running (not just the server, but also the application and its components - e.g RequestCenter, ServiceLink and the associated WebServices). Is there a best practice that you guys recommend or do you have a built in process that we can leverage.
Appreciate any help - Let me know if you require any more information around the request.
I've taken a two-tier approach that has thus far been very successful in keeping our implementation of RequestCenter highly available. Experience has shown that server monitoring is absolutely necessary, but in and of itself isn't a sufficient solution. The server can be up, the newScale Services running, and the site still be inaccessible to users.Depending on your organization you may have to be creative to fill in the gaps.
I would suggest starting with basic server monitoring; you need this to ensure that your infrastructure is up and running. No-brainer.
The second part of the solution would involve utilizing or creating a process that can make http calls to specific sites and monitor / take action based on the response received. It's relatively simple to create a vbScript that can be called as a sceduled task that can check a series of URL's, note the responses and potentially take action as a result.
In my environment I have script that runs every 3 minutes and checks the main URL's for RequestCenter and Servicelink. If a response other than 200 OK is returned the script logs an error to a database table. I have a second script that runs every 5 minutes which consults this table, looking for errors. If an error has been logged the script automatically restarts either the RequestCenter or ServiceLink service on the App Server and logs the results to the same table. If an error has occurred previously within the last hour (or if a total of 5 have occured for the day), I receive an e-mail notification. If more than one error has occurred in the same hour (or more than 10 for the whole day) I get a text page.
The above has the advantages of being completely automated, able to resolve "typical" errors unattended, maintains a trackable and reportable log of events and has the flexibility to engage a resource if a "real" problem has developed with the environment. It may take some leg-work to get something like this setup and running but the return in page-free-nights and weekends has been well worth it.