[Gluster-users] 3.7.13, index healing broken?

Pranith Kumar Karampuri pkarampu at redhat.com
Wed Jul 13 06:10:50 UTC 2016


On Wed, Jul 13, 2016 at 11:27 AM, Dmitry Melekhov <dm at belkam.com> wrote:

> 13.07.2016 09:50, Pranith Kumar Karampuri пишет:
>
>
>
> On Wed, Jul 13, 2016 at 11:11 AM, Dmitry Melekhov < <dm at belkam.com>
> dm at belkam.com> wrote:
>
>> 13.07.2016 09:36, Pranith Kumar Karampuri пишет:
>>
>>
>>
>> On Wed, Jul 13, 2016 at 10:58 AM, Dmitry Melekhov <dm at belkam.com> wrote:
>>
>>> 13.07.2016 09:26, Pranith Kumar Karampuri пишет:
>>>
>>>
>>>
>>> On Wed, Jul 13, 2016 at 10:50 AM, Dmitry Melekhov < <dm at belkam.com>
>>> dm at belkam.com> wrote:
>>>
>>>> 13.07.2016 09:16, Pranith Kumar Karampuri пишет:
>>>>
>>>>
>>>>
>>>> On Wed, Jul 13, 2016 at 10:38 AM, Dmitry Melekhov < <dm at belkam.com>
>>>> dm at belkam.com> wrote:
>>>>
>>>>> 13.07.2016 09:04, Pranith Kumar Karampuri пишет:
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 13, 2016 at 10:29 AM, Dmitry Melekhov < <dm at belkam.com>
>>>>> dm at belkam.com> wrote:
>>>>>
>>>>>> 13.07.2016 08:56, Pranith Kumar Karampuri пишет:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 13, 2016 at 10:23 AM, Dmitry Melekhov < <dm at belkam.com>
>>>>>> dm at belkam.com> wrote:
>>>>>>
>>>>>>> 13.07.2016 08:46, Pranith Kumar Karampuri пишет:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 13, 2016 at 10:10 AM, Dmitry Melekhov < <dm at belkam.com>
>>>>>>> dm at belkam.com> wrote:
>>>>>>>
>>>>>>>> 13.07.2016 08:36, Pranith Kumar Karampuri пишет:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 13, 2016 at 9:35 AM, Dmitry Melekhov < <dm at belkam.com>
>>>>>>>> dm at belkam.com> wrote:
>>>>>>>>
>>>>>>>>> 13.07.2016 01:52, Anuradha Talur пишет:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>
>>>>>>>>>>> From: "Dmitry Melekhov" < <dm at belkam.com>dm at belkam.com>
>>>>>>>>>>> To: "Pranith Kumar Karampuri" < <pkarampu at redhat.com>
>>>>>>>>>>> pkarampu at redhat.com>
>>>>>>>>>>> Cc: "gluster-users" < <gluster-users at gluster.org>
>>>>>>>>>>> gluster-users at gluster.org>
>>>>>>>>>>> Sent: Tuesday, July 12, 2016 9:27:17 PM
>>>>>>>>>>> Subject: Re: [Gluster-users] 3.7.13, index healing broken?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 12.07.2016 17:39, Pranith Kumar Karampuri пишет:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Wow, what are the steps to recreate the problem?
>>>>>>>>>>>
>>>>>>>>>>> just set file length to zero, always reproducible.
>>>>>>>>>>>
>>>>>>>>>>> If you are setting the file length to 0 on one of the bricks
>>>>>>>>>> (looks like
>>>>>>>>>> that is the case), it is not a bug.
>>>>>>>>>>
>>>>>>>>>> Index heal relies on failures seen from the mount point(s)
>>>>>>>>>> to identify the files that need heal. It won't be able to
>>>>>>>>>> recognize any file
>>>>>>>>>> modification done directly on bricks. Same goes for heal info
>>>>>>>>>> command which
>>>>>>>>>> is the reason heal info also shows 0 entries.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Well, this makes self-heal useless then- if any file is accidently
>>>>>>>>> corrupted or deleted (yes! if file is deleted directly from brick this is
>>>>>>>>> no recognized by idex heal too), then it will not be self-healed, because
>>>>>>>>> self-heal uses index heal.
>>>>>>>>>
>>>>>>>>
>>>>>>>> It is better to look into bit-rot feature if you want to guard
>>>>>>>> against these kinds of problems.
>>>>>>>>
>>>>>>>>
>>>>>>>> Bit rot detects bit problems, not missing files or their wrong
>>>>>>>> length, i.e. this is overhead for such simple task.
>>>>>>>>
>>>>>>>
>>>>>>> It detects wrong length. Because checksum won't match anymore.
>>>>>>>
>>>>>>>
>>>>>>> Yes, sure. I guess that it will detect missed files too. But it
>>>>>>> needs far more resources, then just comparing directories in bricks?
>>>>>>>
>>>>>>>
>>>>>>> What use-case you are trying out is leading to changing things
>>>>>>> directly on the brick?
>>>>>>>
>>>>>>> I'm trying to test gluster failure tolerance and right now I'm not
>>>>>>> happy with it...
>>>>>>>
>>>>>>
>>>>>> Which cases of fault tolerance are you not happy with? Making changes
>>>>>> directly on the brick or anything else as well?
>>>>>>
>>>>>> I'll repeat:
>>>>>> As I already said- if I for some reason ( real case  can be only by
>>>>>> accident ) will delete file this will not be detected by self-heal daemon,
>>>>>> and, thus, will lead to lower replication level, i.e. lower failure
>>>>>> tolerance.
>>>>>>
>>>>>
>>>>> To prevent such accidents you need to set selinux policies so that
>>>>> files under the brick are not modified by accident by any user. At least
>>>>> that is the solution I remember when this was discussed 3-4 years back.
>>>>>
>>>>> So only supported platfrom is linux? Or, may be, it is better to
>>>>> improve self-healing to detect missing or wrong length files, I guess this
>>>>> is very low cost in terms of host resources operation.
>>>>> Just a suggestion, may be we need to look to alternatives in near
>>>>> future....
>>>>>
>>>>> This is a corner case, from design perspective it is generally not a
>>>> good idea to optimize for the corner case. It is better to protect
>>>> ourselves from the corner case (SElinux etc) or you can also use snapshots
>>>> to protect against these kind of mishaps.
>>>>
>>>> Sorry, I'm not agree.
>>>> As you  know if on access missed or wrong lenghted file from fuse
>>>> client it is restored (healed), i.e. gluster recognizes file is wrong and
>>>> heal it , so I do not see any reason to provide this such function as
>>>> self-healing.
>>>> Thank you!
>>>>
>>>> Ah! Now how do you suggest we keep track of which of 10s of millions of
>>> files the user accidentally deleted from the brick without gluster's
>>> knowledge? Once it comes to gluster's knowledge we can do something. But
>>> how does gluster become aware of something it is not keeping track of? At
>>> the time you access it gluster knows something went wrong so it restores
>>> it. If you change something on the bricks even by accident all the data
>>> gluster keeps (similar to journal) is a waste. Even the disk filesystems
>>> will ask you to do fsck if something unexpected happens so full self-heal
>>> is similar operation.
>>>
>>>
>>> You are absolutely right- question is why gluster does not become aware
>>> about such problem is case of self-healing?
>>>
>>
>> Because the operations that are performed directly on brick do not go
>> through gluster stack.
>>
>>
>>
>> OK, I'll repeat-
>> As you  know if on access missed or wrong lenghted file from fuse client
>> it is restored (healed), i.e. gluster recognizes file is wrong and heal it
>> , so I do not see any reason to provide this such function as self-healing.
>>
>
> For which you need accessing the file.
>
> That's right.
>
> For which you need full crawl. You can't detect the modification which
> doesn't go through the stack so this is the only possibility.
>
>
> OK, then, if self-heal is really useless and no possible way to get it
> will be provided, I guess we'll use external script to check bricks
> directories consistency,
> don't think ls and diff will get much resources.
>

How is this different from full self-heal?


>
> Thank you!
>
> p.s.
> still can't understand why it can't be implemented in gluster... :-(
>
>
>>
>>>
>>>
>>> --
>>> Pranith
>>>
>>>
>>>
>>
>>
>> --
>> Pranith
>>
>>
>>
>
>
> --
> Pranith
>
>
>


-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160713/13280094/attachment.html>


More information about the Gluster-users mailing list