[Gluster-users] How To Turn Off Self Heal

Fri Jan 16 23:58:27 UTC 2015

Hello,

I created a post a few days ago named "Turning Off Self Heal Options Don't
Appear Work?" which can be found at the following link:
http://www.gluster.org/pipermail/gluster-users/2015-January/020114.html

I never got a response so I decided to set up a test in a lab environment.
I am able to reproduce the same thing so I'm hoping someone can help me.

I have discovered over time that if a single node in a 3-node replicated
cluster with many small files is off for any length of time, when it comes
back on-line, it does a great deal of self-healing that can cause the
glusterfs and glusterfsd processes to spike on the machines to a degree
that makes them unusable.  I only have one volume, with a client mount on
each server where it hosts many websites running PHP.  All is fine until
the healing process goes into overdrive.

So, I attempted to turn off self-healing by setting the following three
settings:
gluster volume set gv0 cluster.data-self-heal off
gluster volume set gv0 cluster.entry-self-heal off
gluster volume set gv0 cluster.metadata-self-heal off

Note that I would rather not set gv0 cluster.self-heal-daemon off as then I
can't see what needs healing such that I can do it at a later time.  Those
settings appear to have no affect at all.

Here is how I reproduced this in my lab:

Output from "gluster volume info gv0":
Volume Name: gv0
Type: Replicate
Volume ID: a55f8619-0789-4a1c-9cda-a903bc908fd1
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.1.116:/export/brick1
Brick2: 192.168.1.140:/export/brick1
Brick3: 192.168.1.123:/export/brick1
Options Reconfigured:
cluster.metadata-self-heal: off
cluster.entry-self-heal: off
cluster.data-self-heal: off

This was done using the latest version of gluster as of this writing,
v3.6.1 installed on CentOS 6.6 using the rpms available from the gluster
web site.

Here is how I tested:
- With all 3 nodes up, I put 4 simple text files on the cluster
- I then turned one node off
- Next I made a change to 2 of the text files
- Then I brought the previously turned off node back up

Upon doing so, I see far more than 2 of the following message in the
glusterhd.log:

[2015-01-15 23:19:30.471384] I
[afr-self-heal-entry.c:545:afr_selfheal_entry_do] 0-gv0-replicate-0:
performing entry selfheal on 00000000-0000-0000-0000-000000000001
[2015-01-15 23:19:30.494714] I
[afr-self-heal-common.c:476:afr_log_selfheal] 0-gv0-replicate-0: Completed
entry selfheal on 00000000-0000-0000-0000-000000000001. source=0 sinks=

Questions:
- So is this a bug?
- Why am I seeing "entry selfheal" messaages when this feature is supposed
to be turned off?
- Also, why am I seeing far more selfheal messages than 2 when I only
changed 2 files while the single node was down?
- Finally, how do I really turn off these selfheals that are taking place
without completely turning off the cluster.self-heal-daemon for reasons
mentioned above?

Thank you for any insight you may be able to provide on this.

-- 
Kyle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150116/115cc298/attachment.html>