[Gluster-devel] Gluster health/status

Wed Feb 24 17:40:45 UTC 2010

2010/2/23 Harald Stürzebecher <haralds at cs.tu-berlin.de>

> 2010/2/22 Samuel Hassine <samuel.hassine at gmail.com>:
> > I'm also looking for a way to monitor gluster nodes.
> >
> > Any solutions ?
> >
> > Le lundi 22 février 2010 à 10:12 +0500, Anton a écrit :
> >> Hello!
> >>
> >>
> >>
> >> I'm looking for the way to determine the health of the GLUSTER
> >> cluster. Is there any way to determine if any of the nodes failed? In
> >> the log files it is possible to grep that there is "remotexx:
> >> disconnected" - but it is not sutable for monitoring. There should be
> >> the simple way to just query the cluster against the .vol file and
> >> see, if any node/brick failed to attach and so trigger the alarm. Is
> >> there anything like "gluster --reporthealth"?
>
> Checking if a connection to the GlusterFS TCP server port (6996 IIRC)
> is possible might be an indicator for working/failing - at least for
> setups that use TCP. I don't know if anything like that is possible
> for Infiniband-only setups.
>
IPoIB (IP over Infiniband)?
>
>
> IIRC, Nagios can check if a port is open on a remote machine. That
> won't find something like disk/filesystem problems on the server, but
> it could report crashed GlusterFS server processes and machines that
> are not working at all.
>
nagios can run checks remotely

http://www.logix.cz/michal/devel/nagios/
http://blogs.techrepublic.com.com/opensource/?p=321

so it can check the real status of glusterfsd or whatever we want on remote
host

>
> I know that this simple method won't provide a positive status (=it
> works) which would be preferable, but at least it can provide a
> negative status (=_something_ failed on _that_ machine) in some cases.

glusterfsd port can be stolen, check of open port is indirect and unreliable
way to check status

>

@gluster.org:
> IIRC, some time ago someone requested a syslog feature to debug
> problems with GlusterFS as root filesystem for a diskless cluster -
> are there any news on that?
> Having the clients report problems to a central logging server might
> be useful for monitoring.
>
monitoring of glusterfs daemons from client side is unreliable as monitoring
errors can be caused by faults on the client side (I suppose nagios server
host(s) to be reliable host)

I insist on remote checks because
  1) glusterfsd should abort if non-recoverable error happened, in the case
remote check of real status is the most reliable check
  2) if glustefsd or any FS-related service continues to work in a
non-healthy state after non-recoverable error happened then it can lead to
damage and irreversible loss of data. Non-recoverable errors should be
investigated and fixed only by system administrator with complete set of
system tools at hands.

Regards,

Alexey.

>
>
> Regards,
>
> Harald
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20100224/91e6e6ea/attachment-0003.html>