[Gluster-users] monitoring gluster replication status
Todd Stansell
todd at stansell.org
Wed Jul 31 17:29:40 UTC 2013
We've been monitoring gluster with an icinga/nagios plugin we wrote that looks
at gluster volume status output (using --xml) to check that the volume and all
bricks are in the 'started' state. Additionally, for 'replicate' volumes, we
look at 'gluster volume heal <vol> info' and 'gluster volume heal <vol> info
heal-failed' output. We count the number of files waiting to be healed and
currently, if it's above 0 for 4 minutes it results in an alert. For
heal-failed, we look at the list of failed entries and alert if any in the
list were from the previous 300 seconds (this list is a running log, rather
than a real-time list of files that are not healed). We were also going to
check the 'heal <vol> info split-brain' output, but that appears to be just a
list of files without timestamps that doesn't get cleaned up after a file is
successfully healed after resolving the split-brain. There's no way to know
if the issue is current or was from last month and has already been resolved.
What we have is better than nothing, but it's not ideal, IMO. I wish there
were something more like NetApp's snapmirror status output where you could
monitor how many seconds behind replication was or a percentage complete
before being caught up. Given that gluster is file-based it's not the same as
snapmirror's snapshot-based replication, so I understand the difficulties
here, but the idea is the same. There's currently no way that I know of to
see how much gluster needs to replicate, how fast it's happening, how soon it
might be done, etc. You just get a list of files that gluster thinks need to
get synced and that's about it. Some of those files might need a full sync
while others might just need attributes changed or a block added. Gluster
doesn't even really know that until it tries to actually heal the file, AFAIK.
Anyway, that's what we're currently doing. It's at least something. We get a
handful of false alerts if we're copying large files or lots of them for many
minutes and gluster hasn't had a chance to catch up entirely, but that's fine
for us. For the most part, we do very few writes, so gluster replication is
always up-to-date.
I'd be curious to hear what others are doing to monitor gluster.
Todd
On Wed, Jul 31, 2013 at 12:35:05PM +0530, Sejal1 S wrote:
> Even I have similar requirement in my setup.
>
> Please suggest us the correct way to ensure the replication
>
>
> .Sejal
>
>
>
> From:
> Matthew Sacks <msacksdandb at gmail.com>
> To:
> gluster-users at gluster.org
> Date:
> 31-07-2013 03:21
> Subject:
> [Gluster-users] monitoring gluster replication status
> Sent by:
> gluster-users-bounces at gluster.org
>
>
>
> Hello,
> >From what I've seen there is no way I can monitor cluster status via munin
> or any other method for that matter.
>
> How can I ensure replication is working properly?
>
> Thanks in advance,
> Matt_______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list