[Gluster-users] Nagios monitoring of replicated volumes..
Christopher Hawkins
chawkins at bplinux.com
Fri Apr 3 12:13:26 UTC 2009
I wrote a script to do something similar. Here's a modified version that will verify working glusterfs mounts in general... All you need is a path that is the same on all nodes being checked and on the node performing the check, and passwordless ssh into the gluster client nodes. For testing I just made a tmp directory right inside the glusterfs mount and used that:
#!/bin/bash
check_node() {
# ssh into the node and have it write its hostname into a temp file
# in a gluster mounted directory. If we can read it from here and it's
# correct, the node is online with 100% certainty
SSH="ssh -q -l root -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o ConnectTimeout=5"
# All nodes must have the same path to this directory
TEMP_DIR=/cluster/tmp
FILE=`mktemp -p $TEMP_DIR`
$SSH $ip "hostname > $FILE"
# For any ip addresses listed in /a_node_ip_list (list them one per line)
# on line 27, we need to get the hostname
# from /etc/hosts. Make sure it's in there
if test "`grep $ip /etc/hosts | awk '{print $2}'`" == "`cat $FILE`"
then
echo "confirmed online"
else
echo "not online. Call someone!"
fi
}
echo
echo "GlusterFS status:"
echo
for ip in `cat /a_node_ip_list`
do
echo -n "checking $ip... "
check_node
done
# Clean up
rm -rf $TEMP_DIR/tmp.*
exit 0
>
> This is an interesting topic indeed.
>
> I'm planning to have each server ping it's AFR pair, and if one of them
> goes down, the moment it comes up, to run ls -lR on the mount.
>
> Perhaps others can share additional ideas?
>
> Regards.
>
> 2009/4/2 Cory Meyer < cory.meyer at gmail.com >
>
> > Has anyone found a decent way out there to monitor GlusterFS volumes?
> > I'm currently using Nagios and Cacti to take care of basic CPU, Load,
> > Memory, and raw Disk I/O. I need to monitor GlusterFS status and making
> > sure all volumes are available..
> >
> > My test environment is 6 servers with 6 AFR volumes which are each shared
> > between those 2 servers. All volumes are mounted on each server.
> >
> > The checks I'm testing out so far include a simple Bash script that
> > writes the current Unix timestamp and hostname to a file once a minute.
> > This is done by each server on only the volumes that they store.
> > echo "$(uname -n):$(date +%s)" > /mnt/gluster01/CHECK_FILE
> >
> > The Nagios NRPE daemon would then execute a Perl script on each of the
> > clients. This script goes thorugh each of the Gluster mount points
> > comparing the timestamps in the CHECK_FILE to the current system time
> > alarming if the timestamp is off by more than a minute. Another test
> > which hasn't been implimented was checking the contents of the CHECK_FILE
> > with the data that is on the raw disk.
> >
> > Bash code to write timestamps and executed via cron once a minute.
> > (write_timestamps.sh)
> > http://glusterfs.pastebin.com/m5a220a6
> >
> > Perl code to compare the timestamps which is executed on the client.
> > (check_glusterfs_mounts.pl)
> > http://glusterfs.pastebin.com/m2f057a77
> >
> > Any ideas/questions/comments?
> >
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090403/dcd5845d/attachment.html>
More information about the Gluster-users
mailing list