[Gluster-users] Nagios monitoring of replicated volumes..

Fri Apr 3 12:13:26 UTC 2009

I wrote a script to do something similar. Here's a modified version that will verify working glusterfs mounts in general... All you need is a path that is the same on all nodes being checked and on the node performing the check, and passwordless ssh into the gluster client nodes. For testing I just made a tmp directory right inside the glusterfs mount and used that: 

#!/bin/bash 

check_node() { 
  # ssh into the node and have it write its hostname into a temp file 
  # in a gluster mounted directory. If we can read it from here and it's 
  # correct, the node is online with 100% certainty 
  SSH="ssh -q -l root -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o ConnectTimeout=5" 

  # All nodes must have the same path to this directory 
  TEMP_DIR=/cluster/tmp 
  FILE=`mktemp -p $TEMP_DIR` 
  $SSH $ip "hostname > $FILE" 

  # For any ip addresses listed in /a_node_ip_list (list them one per line) 
  # on line 27, we need to get the hostname 
  # from /etc/hosts. Make sure it's in there 
  if test "`grep $ip /etc/hosts | awk '{print $2}'`" == "`cat $FILE`" 
   then 
    echo "confirmed online" 
   else 
    echo "not online. Call someone!" 
   fi 
 } 

echo 
echo "GlusterFS status:" 
echo 

for ip in `cat /a_node_ip_list` 
 do 
     echo -n "checking $ip...  " 
     check_node 
 done 

# Clean up 
rm -rf $TEMP_DIR/tmp.* 

exit 0 

> 
> This is an interesting topic indeed. 
> 
> I'm planning to have each server ping it's AFR pair, and if one of them 
> goes down, the moment it comes up, to run ls -lR on the mount. 
> 
> Perhaps others can share additional ideas? 
> 
> Regards. 
> 
> 2009/4/2 Cory Meyer < cory.meyer at gmail.com > 
> 
> > Has anyone found a decent way out there to monitor GlusterFS volumes? 
> > I'm currently using Nagios and Cacti to take care of basic CPU, Load, 
> > Memory, and raw Disk I/O.   I need to monitor GlusterFS status and making 
> > sure all volumes are available.. 
> > 
> > My test environment is 6 servers with 6 AFR volumes which are each shared 
> > between those 2 servers.  All volumes are mounted on each server. 
> > 
> > The checks I'm testing out so far include a simple Bash script that 
> > writes the current Unix timestamp and hostname to a file once a minute. 
> > This is done by each server on only the volumes that they store. 
> >    echo "$(uname -n):$(date +%s)" > /mnt/gluster01/CHECK_FILE 
> > 
> > The Nagios NRPE daemon would then execute a Perl script on each of the 
> > clients.   This script goes thorugh each of the Gluster mount points 
> > comparing the timestamps in the CHECK_FILE to the current system time 
> > alarming if the timestamp is off by more than a minute.  Another test 
> > which hasn't been implimented was checking the contents of the CHECK_FILE 
> >  with the data that is on the raw disk. 
> > 
> > Bash code to write timestamps and executed via cron once a minute. 
> > (write_timestamps.sh) 
> > http://glusterfs.pastebin.com/m5a220a6 
> > 
> > Perl code to compare the timestamps which is executed on the client. 
> > (check_glusterfs_mounts.pl) 
> > http://glusterfs.pastebin.com/m2f057a77 
> > 
> > Any ideas/questions/comments? 
> > 
> > 
> > 
> > _______________________________________________ 
> > Gluster-users mailing list 
> > Gluster-users at gluster.org 
> > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users 

_______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090403/dcd5845d/attachment.html>