[Gluster-users] 3.7.13, index healing broken?
Dmitry Melekhov
dm at belkam.com
Wed Jul 13 05:41:02 UTC 2016
13.07.2016 09:36, Pranith Kumar Karampuri пишет:
>
>
> On Wed, Jul 13, 2016 at 10:58 AM, Dmitry Melekhov <dm at belkam.com
> <mailto:dm at belkam.com>> wrote:
>
> 13.07.2016 09:26, Pranith Kumar Karampuri пишет:
>>
>>
>> On Wed, Jul 13, 2016 at 10:50 AM, Dmitry Melekhov <dm at belkam.com
>> <mailto:dm at belkam.com>> wrote:
>>
>> 13.07.2016 09:16, Pranith Kumar Karampuri пишет:
>>>
>>>
>>> On Wed, Jul 13, 2016 at 10:38 AM, Dmitry Melekhov
>>> <dm at belkam.com <mailto:dm at belkam.com>> wrote:
>>>
>>> 13.07.2016 09:04, Pranith Kumar Karampuri пишет:
>>>>
>>>>
>>>> On Wed, Jul 13, 2016 at 10:29 AM, Dmitry Melekhov
>>>> <dm at belkam.com <mailto:dm at belkam.com>> wrote:
>>>>
>>>> 13.07.2016 08:56, Pranith Kumar Karampuri пишет:
>>>>>
>>>>>
>>>>> On Wed, Jul 13, 2016 at 10:23 AM, Dmitry Melekhov
>>>>> <dm at belkam.com <mailto:dm at belkam.com>> wrote:
>>>>>
>>>>> 13.07.2016 08:46, Pranith Kumar Karampuri пишет:
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 13, 2016 at 10:10 AM, Dmitry
>>>>>> Melekhov <dm at belkam.com
>>>>>> <mailto:dm at belkam.com>> wrote:
>>>>>>
>>>>>> 13.07.2016 08:36, Pranith Kumar Karampuri
>>>>>> пишет:
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 13, 2016 at 9:35 AM, Dmitry
>>>>>>> Melekhov <dm at belkam.com
>>>>>>> <mailto:dm at belkam.com>> wrote:
>>>>>>>
>>>>>>> 13.07.2016 01:52, Anuradha Talur пишет:
>>>>>>>
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>>
>>>>>>> From: "Dmitry Melekhov"
>>>>>>> <dm at belkam.com
>>>>>>> <mailto:dm at belkam.com>>
>>>>>>> To: "Pranith Kumar
>>>>>>> Karampuri"
>>>>>>> <pkarampu at redhat.com
>>>>>>> <mailto:pkarampu at redhat.com>>
>>>>>>> Cc: "gluster-users"
>>>>>>> <gluster-users at gluster.org
>>>>>>> <mailto:gluster-users at gluster.org>>
>>>>>>> Sent: Tuesday, July 12, 2016
>>>>>>> 9:27:17 PM
>>>>>>> Subject: Re: [Gluster-users]
>>>>>>> 3.7.13, index healing broken?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 12.07.2016 17:39, Pranith
>>>>>>> Kumar Karampuri пишет:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Wow, what are the steps to
>>>>>>> recreate the problem?
>>>>>>>
>>>>>>> just set file length to
>>>>>>> zero, always reproducible.
>>>>>>>
>>>>>>> If you are setting the file
>>>>>>> length to 0 on one of the bricks
>>>>>>> (looks like
>>>>>>> that is the case), it is not a bug.
>>>>>>>
>>>>>>> Index heal relies on failures
>>>>>>> seen from the mount point(s)
>>>>>>> to identify the files that need
>>>>>>> heal. It won't be able to
>>>>>>> recognize any file
>>>>>>> modification done directly on
>>>>>>> bricks. Same goes for heal info
>>>>>>> command which
>>>>>>> is the reason heal info also
>>>>>>> shows 0 entries.
>>>>>>>
>>>>>>>
>>>>>>> Well, this makes self-heal useless
>>>>>>> then- if any file is accidently
>>>>>>> corrupted or deleted (yes! if file
>>>>>>> is deleted directly from brick this
>>>>>>> is no recognized by idex heal too),
>>>>>>> then it will not be self-healed,
>>>>>>> because self-heal uses index heal.
>>>>>>>
>>>>>>>
>>>>>>> It is better to look into bit-rot
>>>>>>> feature if you want to guard against
>>>>>>> these kinds of problems.
>>>>>>
>>>>>> Bit rot detects bit problems, not missing
>>>>>> files or their wrong length, i.e. this is
>>>>>> overhead for such simple task.
>>>>>>
>>>>>>
>>>>>> It detects wrong length. Because checksum
>>>>>> won't match anymore.
>>>>>
>>>>> Yes, sure. I guess that it will detect missed
>>>>> files too. But it needs far more resources,
>>>>> then just comparing directories in bricks?
>>>>>>
>>>>>> What use-case you are trying out is leading
>>>>>> to changing things directly on the brick?
>>>>> I'm trying to test gluster failure tolerance
>>>>> and right now I'm not happy with it...
>>>>>
>>>>>
>>>>> Which cases of fault tolerance are you not happy
>>>>> with? Making changes directly on the brick or
>>>>> anything else as well?
>>>>>
>>>> I'll repeat:
>>>> As I already said- if I for some reason ( real
>>>> case can be only by accident ) will delete file
>>>> this will not be detected by self-heal daemon, and,
>>>> thus, will lead to lower replication level, i.e.
>>>> lower failure tolerance.
>>>>
>>>>
>>>> To prevent such accidents you need to set selinux
>>>> policies so that files under the brick are not modified
>>>> by accident by any user. At least that is the solution
>>>> I remember when this was discussed 3-4 years back.
>>>>
>>> So only supported platfrom is linux? Or, may be, it is
>>> better to improve self-healing to detect missing or
>>> wrong length files, I guess this is very low cost in
>>> terms of host resources operation.
>>> Just a suggestion, may be we need to look to
>>> alternatives in near future....
>>>
>>> This is a corner case, from design perspective it is
>>> generally not a good idea to optimize for the corner case.
>>> It is better to protect ourselves from the corner case
>>> (SElinux etc) or you can also use snapshots to protect
>>> against these kind of mishaps.
>>>
>> Sorry, I'm not agree.
>> As you know if on access missed or wrong lenghted file from
>> fuse client it is restored (healed), i.e. gluster recognizes
>> file is wrong and heal it , so I do not see any reason to
>> provide this such function as self-healing.
>> Thank you!
>>
>> Ah! Now how do you suggest we keep track of which of 10s of
>> millions of files the user accidentally deleted from the brick
>> without gluster's knowledge? Once it comes to gluster's knowledge
>> we can do something. But how does gluster become aware of
>> something it is not keeping track of? At the time you access it
>> gluster knows something went wrong so it restores it. If you
>> change something on the bricks even by accident all the data
>> gluster keeps (similar to journal) is a waste. Even the disk
>> filesystems will ask you to do fsck if something unexpected
>> happens so full self-heal is similar operation.
>
> You are absolutely right- question is why gluster does not become
> aware about such problem is case of self-healing?
>
>
> Because the operations that are performed directly on brick do not go
> through gluster stack.
OK, I'll repeat-
As you know if on access missed or wrong lenghted file from fuse client
it is restored (healed), i.e. gluster recognizes file is wrong and heal
it , so I do not see any reason to provide this such function as
self-healing.
>
>>
>>
>> --
>> Pranith
>
>
>
>
> --
> Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160713/7343b80a/attachment.html>
More information about the Gluster-users
mailing list