[Gluster-users] 3.7.13, index healing broken?

Pranith Kumar Karampuri pkarampu at redhat.com
Wed Jul 13 05:50:12 UTC 2016


On Wed, Jul 13, 2016 at 11:11 AM, Dmitry Melekhov <dm at belkam.com> wrote:

> 13.07.2016 09:36, Pranith Kumar Karampuri пишет:
>
>
>
> On Wed, Jul 13, 2016 at 10:58 AM, Dmitry Melekhov < <dm at belkam.com>
> dm at belkam.com> wrote:
>
>> 13.07.2016 09:26, Pranith Kumar Karampuri пишет:
>>
>>
>>
>> On Wed, Jul 13, 2016 at 10:50 AM, Dmitry Melekhov <dm at belkam.com> wrote:
>>
>>> 13.07.2016 09:16, Pranith Kumar Karampuri пишет:
>>>
>>>
>>>
>>> On Wed, Jul 13, 2016 at 10:38 AM, Dmitry Melekhov < <dm at belkam.com>
>>> dm at belkam.com> wrote:
>>>
>>>> 13.07.2016 09:04, Pranith Kumar Karampuri пишет:
>>>>
>>>>
>>>>
>>>> On Wed, Jul 13, 2016 at 10:29 AM, Dmitry Melekhov < <dm at belkam.com>
>>>> dm at belkam.com> wrote:
>>>>
>>>>> 13.07.2016 08:56, Pranith Kumar Karampuri пишет:
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 13, 2016 at 10:23 AM, Dmitry Melekhov < <dm at belkam.com>
>>>>> dm at belkam.com> wrote:
>>>>>
>>>>>> 13.07.2016 08:46, Pranith Kumar Karampuri пишет:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 13, 2016 at 10:10 AM, Dmitry Melekhov < <dm at belkam.com>
>>>>>> dm at belkam.com> wrote:
>>>>>>
>>>>>>> 13.07.2016 08:36, Pranith Kumar Karampuri пишет:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 13, 2016 at 9:35 AM, Dmitry Melekhov < <dm at belkam.com>
>>>>>>> dm at belkam.com> wrote:
>>>>>>>
>>>>>>>> 13.07.2016 01:52, Anuradha Talur пишет:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> ----- Original Message -----
>>>>>>>>>
>>>>>>>>>> From: "Dmitry Melekhov" < <dm at belkam.com>dm at belkam.com>
>>>>>>>>>> To: "Pranith Kumar Karampuri" < <pkarampu at redhat.com>
>>>>>>>>>> pkarampu at redhat.com>
>>>>>>>>>> Cc: "gluster-users" < <gluster-users at gluster.org>
>>>>>>>>>> gluster-users at gluster.org>
>>>>>>>>>> Sent: Tuesday, July 12, 2016 9:27:17 PM
>>>>>>>>>> Subject: Re: [Gluster-users] 3.7.13, index healing broken?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 12.07.2016 17:39, Pranith Kumar Karampuri пишет:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Wow, what are the steps to recreate the problem?
>>>>>>>>>>
>>>>>>>>>> just set file length to zero, always reproducible.
>>>>>>>>>>
>>>>>>>>>> If you are setting the file length to 0 on one of the bricks
>>>>>>>>> (looks like
>>>>>>>>> that is the case), it is not a bug.
>>>>>>>>>
>>>>>>>>> Index heal relies on failures seen from the mount point(s)
>>>>>>>>> to identify the files that need heal. It won't be able to
>>>>>>>>> recognize any file
>>>>>>>>> modification done directly on bricks. Same goes for heal info
>>>>>>>>> command which
>>>>>>>>> is the reason heal info also shows 0 entries.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Well, this makes self-heal useless then- if any file is accidently
>>>>>>>> corrupted or deleted (yes! if file is deleted directly from brick this is
>>>>>>>> no recognized by idex heal too), then it will not be self-healed, because
>>>>>>>> self-heal uses index heal.
>>>>>>>>
>>>>>>>
>>>>>>> It is better to look into bit-rot feature if you want to guard
>>>>>>> against these kinds of problems.
>>>>>>>
>>>>>>>
>>>>>>> Bit rot detects bit problems, not missing files or their wrong
>>>>>>> length, i.e. this is overhead for such simple task.
>>>>>>>
>>>>>>
>>>>>> It detects wrong length. Because checksum won't match anymore.
>>>>>>
>>>>>>
>>>>>> Yes, sure. I guess that it will detect missed files too. But it needs
>>>>>> far more resources, then just comparing directories in bricks?
>>>>>>
>>>>>>
>>>>>> What use-case you are trying out is leading to changing things
>>>>>> directly on the brick?
>>>>>>
>>>>>> I'm trying to test gluster failure tolerance and right now I'm not
>>>>>> happy with it...
>>>>>>
>>>>>
>>>>> Which cases of fault tolerance are you not happy with? Making changes
>>>>> directly on the brick or anything else as well?
>>>>>
>>>>> I'll repeat:
>>>>> As I already said- if I for some reason ( real case  can be only by
>>>>> accident ) will delete file this will not be detected by self-heal daemon,
>>>>> and, thus, will lead to lower replication level, i.e. lower failure
>>>>> tolerance.
>>>>>
>>>>
>>>> To prevent such accidents you need to set selinux policies so that
>>>> files under the brick are not modified by accident by any user. At least
>>>> that is the solution I remember when this was discussed 3-4 years back.
>>>>
>>>> So only supported platfrom is linux? Or, may be, it is better to
>>>> improve self-healing to detect missing or wrong length files, I guess this
>>>> is very low cost in terms of host resources operation.
>>>> Just a suggestion, may be we need to look to alternatives in near
>>>> future....
>>>>
>>>> This is a corner case, from design perspective it is generally not a
>>> good idea to optimize for the corner case. It is better to protect
>>> ourselves from the corner case (SElinux etc) or you can also use snapshots
>>> to protect against these kind of mishaps.
>>>
>>> Sorry, I'm not agree.
>>> As you  know if on access missed or wrong lenghted file from fuse client
>>> it is restored (healed), i.e. gluster recognizes file is wrong and heal it
>>> , so I do not see any reason to provide this such function as self-healing.
>>> Thank you!
>>>
>>> Ah! Now how do you suggest we keep track of which of 10s of millions of
>> files the user accidentally deleted from the brick without gluster's
>> knowledge? Once it comes to gluster's knowledge we can do something. But
>> how does gluster become aware of something it is not keeping track of? At
>> the time you access it gluster knows something went wrong so it restores
>> it. If you change something on the bricks even by accident all the data
>> gluster keeps (similar to journal) is a waste. Even the disk filesystems
>> will ask you to do fsck if something unexpected happens so full self-heal
>> is similar operation.
>>
>>
>> You are absolutely right- question is why gluster does not become aware
>> about such problem is case of self-healing?
>>
>
> Because the operations that are performed directly on brick do not go
> through gluster stack.
>
>
>
> OK, I'll repeat-
> As you  know if on access missed or wrong lenghted file from fuse client
> it is restored (healed), i.e. gluster recognizes file is wrong and heal it
> , so I do not see any reason to provide this such function as self-healing.
>

For which you need accessing the file. For which you need full crawl. You
can't detect the modification which doesn't go through the stack so this is
the only possibility.


>
>
>>
>>
>> --
>> Pranith
>>
>>
>>
>
>
> --
> Pranith
>
>
>


-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160713/07b84e23/attachment.html>


More information about the Gluster-users mailing list