Monitoring at UWE
UWE IT Services has bought into a few commercial solutions for monitoring systems such as BMC software. However, the myUWE (uPortal) team still primarily checks the health of our systems using home-grown scripts. Despite this basic approach, our team is usually the first to spot when Eddie's in the space time continuum...
Heartbeat Monitoring
At UWE, we use a local pc with a wall-mounted screen which monitors various urls - it runs a scheduled task every 10 minutes which executes a wsh script written in vbs. The script makes HTTP GET or POST (in cases where basic auth credentials are required) requests to several urls, waits up to 30 seconds for each response, and checks for a specific string in the response.
These results are collated and saved as xml using ststuses of OK (timely response containing expected string), UX (unexpected: timely response but without expected string) or DN (down: no response within 30 seconds) for each service.
This xml is then pushed out to a webserver where it is served as our 'green screen' using xslt:
Individual boxes represent each service - green for OK, yellow for UX and red for DN. Any service which is defined as a production service in the monitoring script will also turn the background yellow, or red, thereby giving higher visibility to the issue.
In addition, emails and SMS messages are triggered by changes of state (OK becoming UX etc).
Resource Inspection
MRTG Graphs
RRD Graphs
Todo: describe rrd/mrtg