[Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)

Wed Nov 19 18:16:28 UTC 2014

Thank you!

I think we may need some sort of dampening method and more specific input
into Nagios.  i.e. Details on which files are out-of-sync, versus just the
number of files out-of-sync.

I'm using these:  http://download.gluster.org/pub/gluster/glusterfs-nagios/

On Wed, Nov 19, 2014 at 10:14 AM, Nishanth Thomas <nthomas at redhat.com>
wrote:

> Hi Vince,
>
> Are you referring the monitoring scripts mentioned in the blog(
> http://gopukrish.wordpress.com/2014/11/16/monitor-glusterfs-using-nagios-plugin/)
> or the scripts part of the gluster(
> http://gluster.org/pipermail/gluster-users.old/2014-June/017819.html)?
> Please confirm?
>
> Thanks,
> Nishanth
>
> ----- Original Message -----
> From: "Humble Devassy Chirammal" <humble.devassy at gmail.com>
> To: "Vince Loschiavo" <vloschiavo at gmail.com>
> Cc: "gluster-users at gluster.org" <Gluster-users at gluster.org>, "Sahina
> Bose" <sabose at redhat.com>, nthomas at redhat.com
> Sent: Wednesday, November 19, 2014 11:22:18 PM
> Subject: Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios
> related)
>
> Hi Vince,
> It could be a behavioural change in heal process output capture with latest
> GlusterFS. If that is the case, we may tune the interval which  nagios
> collect heal info output  or some other settings to avoid continuous
> alerts. I am Ccing  gluster nagios devs.
>
> --Humble
>
> --Humble
>
>
> On Wed, Nov 19, 2014 at 9:50 PM, Vince Loschiavo <vloschiavo at gmail.com>
> wrote:
>
> >
> > Hello Gluster Community,
> >
> > I have been using the Nagios monitoring scripts, mentioned in the below
> > thread, on 3.5.2 with great success. The most useful of these is the self
> > heal.
> >
> > However, I've just upgraded to 3.6.1 on the lab and the self heal daemon
> > has become quite aggressive.  I continually get alerts/warnings on 3.6.1
> > that virt disk images need self heal, then they clear.  This is not the
> > case on 3.5.2.  This
> >
> > Configuration:
> > 2 node, 2 brick replicated volume with 2x1GB LAG network between the
> peers
> > using this volume as a QEMU/KVM virt image store through the fuse mount
> on
> > Centos 6.5.
> >
> > Example:
> > on 3.5.2:
> > *gluster volume heal volumename info:  *shows the bricks and number of
> > entries to be healed: 0
> >
> > On v3.5.2 - During normal gluster operations, I can run this command over
> > and over again, 2-4 times per second, and it will always show 0 entries
> to
> > be healed.  I've used this as an indicator that the bricks are
> > synchronized.
> >
> > Last night, I upgraded to 3.6.1 in lab and I'm seeing different behavior.
> > Running *gluster volume heal volumename info*, during normal operations,
> > will show a file out-of-sync, seemingly between every block written to
> disk
> > then synced to the peer.  I can run the command over and over again, 2-4
> > times per second, and it will almost always show something out of sync.
> > The individual files change, meaning:
> >
> > Example:
> > 1st Run: shows file1 out of sync
> > 2nd run: shows file 2 and file 3 out of sync but file 1 is now in sync
> > (not in the list)
> > 3rd run: shows file 3 and file 4 out of sync but file 1 and 2 are in sync
> > (not in the list).
> > ...
> > nth run: shows 0 files out of sync
> > nth+1 run: shows file 3 and 12 out of sync.
> >
> > From looking at the virtual machines running off this gluster volume,
> it's
> > obvious that gluster is working well.  However, this obviously plays
> havoc
> > with Nagios and alerts.  Nagios will run the heal info and get different
> > and non-useful results each time, and will send alerts.
> >
> > Is this behavior change (3.5.2 vs 3.6.1) expected?  Is there a way to
> tune
> > the settings or change the monitoring method to get better results into
> > Nagios.
> >
> > Thank you,
> >
> > --
> > -Vince Loschiavo
> >
> >
> > On Wed, Nov 19, 2014 at 4:35 AM, Humble Devassy Chirammal <
> > humble.devassy at gmail.com> wrote:
> >
> >> Hi Gopu,
> >>
> >> Awesome !!
> >>
> >> We can  have a Gluster blog about this implementation.
> >>
> >> --Humble
> >>
> >>
> >>
> >> --Humble
> >>
> >>
> >> On Wed, Nov 19, 2014 at 5:38 PM, Gopu Krishnan <
> gopukrishnantec at gmail.com
> >> > wrote:
> >>
> >>> Thanks for all your help... I was able to configure nagios using the
> >>> glusterfs plugin. Following link shows how I configured it. Hope it
> helps
> >>> someone else.:
> >>>
> >>>
> >>>
> http://gopukrish.wordpress.com/2014/11/16/monitor-glusterfs-using-nagios-plugin/
> >>>
> >>> On Sun, Nov 16, 2014 at 11:44 AM, Humble Devassy Chirammal <
> >>> humble.devassy at gmail.com> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Please look at this thread
> >>>> http://gluster.org/pipermail/gluster-users.old/2014-June/017819.html
> >>>>
> >>>> Btw,  if you are around, we have a talk on same topic in upcoming
> >>>> GlusterFS India meetup.
> >>>>
> >>>> Details can be fetched from:
> >>>>  http://www.meetup.com/glusterfs-India/
> >>>>
> >>>> --Humble
> >>>>
> >>>> --Humble
> >>>>
> >>>>
> >>>> On Sun, Nov 16, 2014 at 11:23 AM, Gopu Krishnan <
> >>>> gopukrishnantec at gmail.com> wrote:
> >>>>
> >>>>> How can we monitor the glusters and alert us if something happened
> >>>>> wrong. I found some nagios plugins and didn't work until this time.
> I am
> >>>>> still experimenting with those. Any suggestions would be much helpful
> >>>>>
> >>>>> _______________________________________________
> >>>>> Gluster-users mailing list
> >>>>> Gluster-users at gluster.org
> >>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >>
> >
> >
> >
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >
>

-- 
-Vince Loschiavo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141119/0040346c/attachment.html>