[Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)

Pranith Kumar Karampuri pkarampu at redhat.com
Sat Nov 22 17:03:38 UTC 2014


On 11/22/2014 10:12 PM, Vince Loschiavo wrote:
> Thank you for that information.
>
> Are there plans to restore the previous functionality in a later 
> release of 3.6.x? Or is this what we should expect going forward?
Yes it will definitely be fixed. Wait for the next release. Things 
should be fine.

Pranith
>
>
>
> On Thu, Nov 20, 2014 at 11:24 PM, Anuradha Talur <atalur at redhat.com 
> <mailto:atalur at redhat.com>> wrote:
>
>
>
>     ----- Original Message -----
>     > From: "Joe Julian" <joe at julianfamily.org
>     <mailto:joe at julianfamily.org>>
>     > To: "Anuradha Talur" <atalur at redhat.com
>     <mailto:atalur at redhat.com>>, "Vince Loschiavo"
>     <vloschiavo at gmail.com <mailto:vloschiavo at gmail.com>>
>     > Cc: "gluster-users at gluster.org
>     <mailto:gluster-users at gluster.org>" <Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org>>
>     > Sent: Friday, November 21, 2014 12:06:27 PM
>     > Subject: Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help
>     (Nagios related)
>     >
>     >
>     >
>     > On November 20, 2014 10:01:45 PM PST, Anuradha Talur
>     <atalur at redhat.com <mailto:atalur at redhat.com>>
>     > wrote:
>     > >
>     > >
>     > >----- Original Message -----
>     > >> From: "Vince Loschiavo" <vloschiavo at gmail.com
>     <mailto:vloschiavo at gmail.com>>
>     > >> To: "gluster-users at gluster.org
>     <mailto:gluster-users at gluster.org>" <Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org>>
>     > >> Sent: Wednesday, November 19, 2014 9:50:50 PM
>     > >> Subject: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help
>     (Nagios
>     > >related)
>     > >>
>     > >>
>     > >> Hello Gluster Community,
>     > >>
>     > >> I have been using the Nagios monitoring scripts, mentioned in the
>     > >below
>     > >> thread, on 3.5.2 with great success. The most useful of these
>     is the
>     > >self
>     > >> heal.
>     > >>
>     > >> However, I've just upgraded to 3.6.1 on the lab and the self heal
>     > >daemon has
>     > >> become quite aggressive. I continually get alerts/warnings on
>     3.6.1
>     > >that
>     > >> virt disk images need self heal, then they clear. This is not the
>     > >case on
>     > >> 3.5.2. This
>     > >>
>     > >> Configuration:
>     > >> 2 node, 2 brick replicated volume with 2x1GB LAG network
>     between the
>     > >peers
>     > >> using this volume as a QEMU/KVM virt image store through the fuse
>     > >mount on
>     > >> Centos 6.5.
>     > >>
>     > >> Example:
>     > >> on 3.5.2:
>     > >> gluster volume heal volumename info: shows the bricks and
>     number of
>     > >entries
>     > >> to be healed: 0
>     > >>
>     > >> On v3.5.2 - During normal gluster operations, I can run this
>     command
>     > >over and
>     > >> over again, 2-4 times per second, and it will always show 0
>     entries
>     > >to be
>     > >> healed. I've used this as an indicator that the bricks are
>     > >synchronized.
>     > >>
>     > >> Last night, I upgraded to 3.6.1 in lab and I'm seeing different
>     > >behavior.
>     > >> Running gluster volume heal volumename info , during normal
>     > >operations, will
>     > >> show a file out-of-sync, seemingly between every block written to
>     > >disk then
>     > >> synced to the peer. I can run the command over and over
>     again, 2-4
>     > >times per
>     > >> second, and it will almost always show something out of sync. The
>     > >individual
>     > >> files change, meaning:
>     > >>
>     > >> Example:
>     > >> 1st Run: shows file1 out of sync
>     > >> 2nd run: shows file 2 and file 3 out of sync but file 1 is now in
>     > >sync (not
>     > >> in the list)
>     > >> 3rd run: shows file 3 and file 4 out of sync but file 1 and 2
>     are in
>     > >sync
>     > >> (not in the list).
>     > >> ...
>     > >> nth run: shows 0 files out of sync
>     > >> nth+1 run: shows file 3 and 12 out of sync.
>     > >>
>     > >> From looking at the virtual machines running off this gluster
>     volume,
>     > >it's
>     > >> obvious that gluster is working well. However, this obviously
>     plays
>     > >havoc
>     > >> with Nagios and alerts. Nagios will run the heal info and get
>     > >different and
>     > >> non-useful results each time, and will send alerts.
>     > >>
>     > >> Is this behavior change (3.5.2 vs 3.6.1) expected? Is there a
>     way to
>     > >tune the
>     > >> settings or change the monitoring method to get better
>     results into
>     > >Nagios.
>     > >>
>     > >In 3.6.1 the way heal info command works is different from that in
>     > >3.5.2. In 3.6.1, it is self-heal daemon that gathers the
>     entries that
>     > >might need healing. Currently, in 3.6.1, there isn't a method to
>     > >distinguish between a file that is being healed and a file with
>     > >on-going I/O while listing. Hence you see files with normal
>     operation
>     > >too listed in the output of heal info command.
>     >
>     > How did that regression pass?!
>     Test cases to check this condition was not written in regression
>     tests.
>     >
>
>     --
>     Thanks,
>     Anuradha.
>
>
>
>
> -- 
> -Vince Loschiavo
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141122/64554e59/attachment.html>


More information about the Gluster-users mailing list