Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Monitoring

...

at UWE

UWE IT Services has bought into a few commercial solutions for monitoring systems such as BMC software. However, the myUWE (uPortal) team still primarily checks the health of our systems using home-grown scripts. Despite this basic approach, our team is usually the first to spot when Eddie's in the space time continuum...

Heartbeat Monitoring

At UWE, we use a single PC in the corner of the office local pc with a wall-mounted monitor which we use to monitor our systems' heartbeat - this screen which monitors various urls - it runs a scheduled task every 10 minutes which calls executes a wsh script written in vbs. The script makes HTTP GET or POST (in cases where basic auth credentials are required) requests to several urls, waits for responses within up to 30 seconds for each response, and checks for a specific string in each the response.

These responses results are collated and recorded saved as xml using ststuses of OK (timely response containing expected string), UX (unexpected: timely response but without expected string) or DN (down: no response within 30 seconds) for each service.

This xml is saved as XML and then pushed out to a webserver where it is served as our 'green screen' using XSLTxslt:

Image Added

Center
http://info.uwe.ac.uk/status

...

/default.asp?view=multipleColumns

Individual boxes represent each service - green for OK, yellow for UX and red for DN. Any service which is defined as a production service in the monitoring script will also turn the background yellow, or red, thereby giving higher visibility to the issue.

In addition, emails and SMS messages are triggered by changes of state (OK becoming UX etc).!UWE_status-screen|thumbnail,align=centre!Image Removed

Resource Inspection

Center
Image AddedImage Added
Center
MRTG Graphs
Center
Image AddedImage Added
Center
RRD Graphs

Todo: describe rrd/mrtg