[Gluster-users] Nagios monitoring of replicated volumes..

Thu Apr 2 20:39:09 UTC 2009

Has anyone found a decent way out there to monitor GlusterFS volumes?
I'm currently using Nagios and Cacti to take care of basic CPU, Load,
Memory, and raw Disk I/O.   I need to monitor GlusterFS status and making
sure all volumes are available..

My test environment is 6 servers with 6 AFR volumes which are each shared
between those 2 servers.  All volumes are mounted on each server.

The checks I'm testing out so far include a simple Bash script that writes
the current Unix timestamp and hostname to a file once a minute. This is
done by each server on only the volumes that they store.
   echo "$(uname -n):$(date +%s)" > /mnt/gluster01/CHECK_FILE

The Nagios NRPE daemon would then execute a Perl script on each of the
clients.   This script goes thorugh each of the Gluster mount points
comparing the timestamps in the CHECK_FILE to the current system time
alarming if the timestamp is off by more than a minute.  Another test which
hasn't been implimented was checking the contents of the CHECK_FILE  with
the data that is on the raw disk.

Bash code to write timestamps and executed via cron once a minute.
(write_timestamps.sh)
http://glusterfs.pastebin.com/m5a220a6

Perl code to compare the timestamps which is executed on the client.
(check_glusterfs_mounts.pl)
http://glusterfs.pastebin.com/m2f057a77

Any ideas/questions/comments?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090402/62e2caba/attachment.html>