[Gluster-users] 3.7.13, index healing broken?

Wed Jul 13 05:20:08 UTC 2016

13.07.2016 09:16, Pranith Kumar Karampuri пишет:
>
>
> On Wed, Jul 13, 2016 at 10:38 AM, Dmitry Melekhov <dm at belkam.com 
> <mailto:dm at belkam.com>> wrote:
>
>     13.07.2016 09:04, Pranith Kumar Karampuri пишет:
>>
>>
>>     On Wed, Jul 13, 2016 at 10:29 AM, Dmitry Melekhov <dm at belkam.com
>>     <mailto:dm at belkam.com>> wrote:
>>
>>         13.07.2016 08:56, Pranith Kumar Karampuri пишет:
>>>
>>>
>>>         On Wed, Jul 13, 2016 at 10:23 AM, Dmitry Melekhov
>>>         <dm at belkam.com <mailto:dm at belkam.com>> wrote:
>>>
>>>             13.07.2016 08:46, Pranith Kumar Karampuri пишет:
>>>>
>>>>
>>>>             On Wed, Jul 13, 2016 at 10:10 AM, Dmitry Melekhov
>>>>             <dm at belkam.com <mailto:dm at belkam.com>> wrote:
>>>>
>>>>                 13.07.2016 08:36, Pranith Kumar Karampuri пишет:
>>>>>
>>>>>
>>>>>                 On Wed, Jul 13, 2016 at 9:35 AM, Dmitry Melekhov
>>>>>                 <dm at belkam.com <mailto:dm at belkam.com>> wrote:
>>>>>
>>>>>                     13.07.2016 01:52, Anuradha Talur пишет:
>>>>>
>>>>>
>>>>>                         ----- Original Message -----
>>>>>
>>>>>                             From: "Dmitry Melekhov" <dm at belkam.com
>>>>>                             <mailto:dm at belkam.com>>
>>>>>                             To: "Pranith Kumar Karampuri"
>>>>>                             <pkarampu at redhat.com
>>>>>                             <mailto:pkarampu at redhat.com>>
>>>>>                             Cc: "gluster-users"
>>>>>                             <gluster-users at gluster.org
>>>>>                             <mailto:gluster-users at gluster.org>>
>>>>>                             Sent: Tuesday, July 12, 2016 9:27:17 PM
>>>>>                             Subject: Re: [Gluster-users] 3.7.13,
>>>>>                             index healing broken?
>>>>>
>>>>>
>>>>>
>>>>>                             12.07.2016 17:39, Pranith Kumar
>>>>>                             Karampuri пишет:
>>>>>
>>>>>
>>>>>
>>>>>                             Wow, what are the steps to recreate
>>>>>                             the problem?
>>>>>
>>>>>                             just set file length to zero, always
>>>>>                             reproducible.
>>>>>
>>>>>                         If you are setting the file length to 0 on
>>>>>                         one of the bricks (looks like
>>>>>                         that is the case), it is not a bug.
>>>>>
>>>>>                         Index heal relies on failures seen from
>>>>>                         the mount point(s)
>>>>>                         to identify the files that need heal. It
>>>>>                         won't be able to recognize any file
>>>>>                         modification done directly on bricks. Same
>>>>>                         goes for heal info command which
>>>>>                         is the reason heal info also shows 0 entries.
>>>>>
>>>>>
>>>>>                     Well, this makes self-heal useless then- if
>>>>>                     any file is accidently corrupted or deleted
>>>>>                     (yes! if file is deleted directly from brick
>>>>>                     this is no recognized by idex heal too), then
>>>>>                     it will not be self-healed, because self-heal
>>>>>                     uses index heal.
>>>>>
>>>>>
>>>>>                 It is better to look into bit-rot feature if you
>>>>>                 want to guard against these kinds of problems.
>>>>
>>>>                 Bit rot detects bit problems, not missing files or
>>>>                 their wrong length, i.e. this is overhead for such
>>>>                 simple task.
>>>>
>>>>
>>>>             It detects wrong length. Because checksum won't match
>>>>             anymore.
>>>
>>>             Yes, sure. I guess that it will detect missed files too.
>>>             But it needs far more resources, then just comparing
>>>             directories in bricks?
>>>>
>>>>             What use-case you are trying out is leading to changing
>>>>             things directly on the brick?
>>>             I'm trying to test gluster failure tolerance and right
>>>             now I'm not happy with it...
>>>
>>>
>>>         Which cases of fault tolerance are you not happy with?
>>>         Making changes directly on the brick or anything else as well?
>>>
>>         I'll repeat:
>>         As I already said- if I for some reason ( real case  can be
>>         only by accident ) will delete file this will not be detected
>>         by self-heal daemon, and, thus, will lead to lower
>>         replication level, i.e. lower failure tolerance.
>>
>>
>>     To prevent such accidents you need to set selinux policies so
>>     that files under the brick are not modified by accident by any
>>     user. At least that is the solution I remember when this was
>>     discussed 3-4 years back.
>>
>     So only supported platfrom is linux? Or, may be, it is better to
>     improve self-healing to detect missing or wrong length files, I
>     guess this is very low cost in terms of host resources operation.
>     Just a suggestion, may be we need to look to alternatives in near
>     future....
>
> This is a corner case, from design perspective it is generally not a 
> good idea to optimize for the corner case. It is better to protect 
> ourselves from the corner case (SElinux etc) or you can also use 
> snapshots to protect against these kind of mishaps.
>
Sorry, I'm not agree.
As you  know if on access missed or wrong lenghted file from fuse client 
it is restored (healed), i.e. gluster recognizes file is wrong and heal 
it , so I do not see any reason to provide this such function as 
self-healing.
Thank you!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160713/fb5eb038/attachment-0001.html>