I had previously hand-rolled a status monitor, status.za3k.com, which I am in the process of replacing (new version). I am replacing it with a linux monitoring daemon, mon, which I recommend. It is targeted at working system administrators. ‘mon’ adds many features over my own system, but still has a very bare-bones feeling.
The old service, ‘simple-status‘ worked as follows:
- You visited the URL. Then, the status page would (live) kick of about 30 parallel jobs, to check the status of 30 services
- The list of services is one-per-file in a the services.d directory.
- For each service, it ran a short script, with no command line arguments.
- All output is displayed in a simple html table, with the name of the service, the status (with color coding), and a short output line.
- The script could return with a success (0) or non-success status code. If it returned success, that status line would display in green for success. If it failed, the line would be highlighted red for failure.
- Scripts can be anything, but I wrote several utility functions to be called from scripts. For example, “ping?” checks whether a host is pingable.
- Each script was wrapped in timeout. If the script took too long to run, it would be highlighted yellow.
- The reason all scripts ran every time, is to prevent a failure mode where the information could ever be stale without me noticing.
Mon works as follows
- The list of 30 services is defined in /etc/mon/con.cf.
- For each service, it runs a single-line command (monitor) with arguments. The hostname(s) are added to the command line automatically.
- All output can be displayed in a simple html table, with the name of the service, the status (with color coding), the time of last and next run, and a short output line. Or, I use ‘monshow‘, which is similar but in a text format.
- Monitors can be anything, but several useful ones are provided in /usr/lib/mon/mon.d (on debian). For example the monitor “ping” checks whether a host is pingable.
- The script could return with a success (0) or non-success status code. If it returned success, the status line would display in green for success (on the web interface), or red for failure.
- All scripts run periodically. A script have many states, not just “success” or “failure”. For example “untested” (not yet run) or “dependency failing” (I assume, not yet seen).
As you can see, the two have a very similar approach to the component scripts, which is natural in the Linux world. Here is a comparison.
- ‘simple-status’ does exactly one thing. ‘mon’ has many features, but does the minimum possible to provide each.
- ‘simple-status’ is stateless. ‘mon’ has state.
- ‘simple-status’ runs on demand. ‘mon’ is a daemon which runs monitors periodically.
- Input is different. ‘simple-status’ is one script which takes a timeout. ‘mon’ listens for trap signals and talks to clients who want to know its state.
- both can show an HTML status page that looks about the same, with some CGI parameters accepted.
- ‘mon’ can also show a text status page.
- both run monitors which return success based on status code, and provide extra information as standard output. ‘mon’ scripts are expected to be able to run on a list of hosts, rather than just one.
- ‘mon’ has a config file. ‘simple-status’ has no options.
- ‘simple-status’ is simple (27 lines). ‘mon’ has longer code (4922 lines)
- ‘simple-status’ is written in bash, and does not expose this. ‘mon’ is written in perl, all the monitors are written in perl, and it allows inline perl in the config file
- ‘simple-status’ limits the execution time of monitors. ‘mon’ does not.
- ‘mon’ allows alerting, which call an arbitrary program to deliver the alert (email is common)
- ‘mon’ supports traps, which are active alerts
- ‘mon’ supports watchdog/heartbeat style alerts, where if a trap is not regularly received, it marks a service as failed.
- ‘mon’ supports dependencies
- ‘mon’ allows defining a service for several hosts at once
Overall I think that ‘mon’ is much more complex, but only to add features, and it doesn’t have a lot of features I wouldn’t use. It still is pretty simple with a simple interface. I recommend it as both good, and overall better than my system.
My only complaint is that it’s basically impossible to Google, which is why I’m writing a recommendation for it here.