[Gluster-users] 3.7.13, index healing broken?

Pranith Kumar Karampuri pkarampu at redhat.com
Wed Jul 13 07:40:05 UTC 2016


On Wed, Jul 13, 2016 at 12:11 PM, Dmitry Melekhov <dm at belkam.com> wrote:

> 13.07.2016 10:24, Pranith Kumar Karampuri пишет:
>
>
>
> On Wed, Jul 13, 2016 at 11:49 AM, Dmitry Melekhov < <dm at belkam.com>
> dm at belkam.com> wrote:
>
>> 13.07.2016 10:10, Pranith Kumar Karampuri пишет:
>>
>>
>>
>> On Wed, Jul 13, 2016 at 11:27 AM, Dmitry Melekhov <dm at belkam.com> wrote:
>>
>>> 13.07.2016 09:50, Pranith Kumar Karampuri пишет:
>>>
>>>
>>>
>>> On Wed, Jul 13, 2016 at 11:11 AM, Dmitry Melekhov < <dm at belkam.com>
>>> dm at belkam.com> wrote:
>>>
>>>> 13.07.2016 09:36, Pranith Kumar Karampuri пишет:
>>>>
>>>>
>>>>
>>>> On Wed, Jul 13, 2016 at 10:58 AM, Dmitry Melekhov < <dm at belkam.com>
>>>> dm at belkam.com> wrote:
>>>>
>>>>> 13.07.2016 09:26, Pranith Kumar Karampuri пишет:
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 13, 2016 at 10:50 AM, Dmitry Melekhov < <dm at belkam.com>
>>>>> dm at belkam.com> wrote:
>>>>>
>>>>>> 13.07.2016 09:16, Pranith Kumar Karampuri пишет:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 13, 2016 at 10:38 AM, Dmitry Melekhov < <dm at belkam.com>
>>>>>> dm at belkam.com> wrote:
>>>>>>
>>>>>>> 13.07.2016 09:04, Pranith Kumar Karampuri пишет:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 13, 2016 at 10:29 AM, Dmitry Melekhov < <dm at belkam.com>
>>>>>>> dm at belkam.com> wrote:
>>>>>>>
>>>>>>>> 13.07.2016 08:56, Pranith Kumar Karampuri пишет:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 13, 2016 at 10:23 AM, Dmitry Melekhov < <dm at belkam.com>
>>>>>>>> dm at belkam.com> wrote:
>>>>>>>>
>>>>>>>>> 13.07.2016 08:46, Pranith Kumar Karampuri пишет:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jul 13, 2016 at 10:10 AM, Dmitry Melekhov <
>>>>>>>>> <dm at belkam.com>dm at belkam.com> wrote:
>>>>>>>>>
>>>>>>>>>> 13.07.2016 08:36, Pranith Kumar Karampuri пишет:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 13, 2016 at 9:35 AM, Dmitry Melekhov <
>>>>>>>>>> <dm at belkam.com>dm at belkam.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> 13.07.2016 01:52, Anuradha Talur пишет:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>
>>>>>>>>>>>>> From: "Dmitry Melekhov" < <dm at belkam.com>dm at belkam.com>
>>>>>>>>>>>>> To: "Pranith Kumar Karampuri" < <pkarampu at redhat.com>
>>>>>>>>>>>>> pkarampu at redhat.com>
>>>>>>>>>>>>> Cc: "gluster-users" < <gluster-users at gluster.org>
>>>>>>>>>>>>> gluster-users at gluster.org>
>>>>>>>>>>>>> Sent: Tuesday, July 12, 2016 9:27:17 PM
>>>>>>>>>>>>> Subject: Re: [Gluster-users] 3.7.13, index healing broken?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 12.07.2016 17:39, Pranith Kumar Karampuri пишет:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Wow, what are the steps to recreate the problem?
>>>>>>>>>>>>>
>>>>>>>>>>>>> just set file length to zero, always reproducible.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you are setting the file length to 0 on one of the bricks
>>>>>>>>>>>> (looks like
>>>>>>>>>>>> that is the case), it is not a bug.
>>>>>>>>>>>>
>>>>>>>>>>>> Index heal relies on failures seen from the mount point(s)
>>>>>>>>>>>> to identify the files that need heal. It won't be able to
>>>>>>>>>>>> recognize any file
>>>>>>>>>>>> modification done directly on bricks. Same goes for heal info
>>>>>>>>>>>> command which
>>>>>>>>>>>> is the reason heal info also shows 0 entries.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Well, this makes self-heal useless then- if any file is
>>>>>>>>>>> accidently corrupted or deleted (yes! if file is deleted directly from
>>>>>>>>>>> brick this is no recognized by idex heal too), then it will not be
>>>>>>>>>>> self-healed, because self-heal uses index heal.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It is better to look into bit-rot feature if you want to guard
>>>>>>>>>> against these kinds of problems.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Bit rot detects bit problems, not missing files or their wrong
>>>>>>>>>> length, i.e. this is overhead for such simple task.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It detects wrong length. Because checksum won't match anymore.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes, sure. I guess that it will detect missed files too. But it
>>>>>>>>> needs far more resources, then just comparing directories in bricks?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What use-case you are trying out is leading to changing things
>>>>>>>>> directly on the brick?
>>>>>>>>>
>>>>>>>>> I'm trying to test gluster failure tolerance and right now I'm not
>>>>>>>>> happy with it...
>>>>>>>>>
>>>>>>>>
>>>>>>>> Which cases of fault tolerance are you not happy with? Making
>>>>>>>> changes directly on the brick or anything else as well?
>>>>>>>>
>>>>>>>> I'll repeat:
>>>>>>>> As I already said- if I for some reason ( real case  can be only by
>>>>>>>> accident ) will delete file this will not be detected by self-heal daemon,
>>>>>>>> and, thus, will lead to lower replication level, i.e. lower failure
>>>>>>>> tolerance.
>>>>>>>>
>>>>>>>
>>>>>>> To prevent such accidents you need to set selinux policies so that
>>>>>>> files under the brick are not modified by accident by any user. At least
>>>>>>> that is the solution I remember when this was discussed 3-4 years back.
>>>>>>>
>>>>>>> So only supported platfrom is linux? Or, may be, it is better to
>>>>>>> improve self-healing to detect missing or wrong length files, I guess this
>>>>>>> is very low cost in terms of host resources operation.
>>>>>>> Just a suggestion, may be we need to look to alternatives in near
>>>>>>> future....
>>>>>>>
>>>>>>> This is a corner case, from design perspective it is generally not a
>>>>>> good idea to optimize for the corner case. It is better to protect
>>>>>> ourselves from the corner case (SElinux etc) or you can also use snapshots
>>>>>> to protect against these kind of mishaps.
>>>>>>
>>>>>> Sorry, I'm not agree.
>>>>>> As you  know if on access missed or wrong lenghted file from fuse
>>>>>> client it is restored (healed), i.e. gluster recognizes file is wrong and
>>>>>> heal it , so I do not see any reason to provide this such function as
>>>>>> self-healing.
>>>>>> Thank you!
>>>>>>
>>>>>> Ah! Now how do you suggest we keep track of which of 10s of millions
>>>>> of files the user accidentally deleted from the brick without gluster's
>>>>> knowledge? Once it comes to gluster's knowledge we can do something. But
>>>>> how does gluster become aware of something it is not keeping track of? At
>>>>> the time you access it gluster knows something went wrong so it restores
>>>>> it. If you change something on the bricks even by accident all the data
>>>>> gluster keeps (similar to journal) is a waste. Even the disk filesystems
>>>>> will ask you to do fsck if something unexpected happens so full self-heal
>>>>> is similar operation.
>>>>>
>>>>>
>>>>> You are absolutely right- question is why gluster does not become
>>>>> aware about such problem is case of self-healing?
>>>>>
>>>>
>>>> Because the operations that are performed directly on brick do not go
>>>> through gluster stack.
>>>>
>>>>
>>>>
>>>> OK, I'll repeat-
>>>> As you  know if on access missed or wrong lenghted file from fuse
>>>> client it is restored (healed), i.e. gluster recognizes file is wrong and
>>>> heal it , so I do not see any reason to provide this such function as
>>>> self-healing.
>>>>
>>>
>>> For which you need accessing the file.
>>>
>>> That's right.
>>>
>>> For which you need full crawl. You can't detect the modification which
>>> doesn't go through the stack so this is the only possibility.
>>>
>>>
>>> OK, then, if self-heal is really useless and no possible way to get it
>>> will be provided, I guess we'll use external script to check bricks
>>> directories consistency,
>>> don't think ls and diff will get much resources.
>>>
>>
>> How is this different from full self-heal?
>>
>>
>> Self-heal does not detect deleted or wrong-length files .
>>
>
> It detects when you do full crawl. Which essentially is ls -laR kind of
> thing on the whole volume. You don't need any external scripts, keep doing
> full crawl once in a while may be?
>
>
> You mean on fuse mount?
>
> It doesn't work:
>
> [root at father ~]# mount -t glusterfs localhost:/pool gluster
>
> [root at father ~]#
>
> then make it zero lengths in brick:
>
> [root at father gluster]# > /wall/pool/brick/gstatus-0.64-3.el7.x86_64.rpm
> [root at father gluster]#
>
>
> [root at father gluster]# ls -laR  /root/gluster/
> /root/gluster/:
> итого 122153384
> drwxr-xr-x   4 qemu qemu        4096 июл 11 13:36 .
> dr-xr-x---. 10 root root        4096 июл 11 12:26 ..
> -rw-r--r--   1 root root  8589934592 май 11 09:14 csr1000v1.img
> -rw-r--r--   1 root root           0 июл 13 10:34
> gstatus-0.64-3.el7.x86_64.rpm
>
>
> As you can see gstatus-0.64-3.el7.x86_64.rpm has 0 length
> But:
>
> [root at father gluster]# touch /root/gluster/gstatus-0.64-3.el7.x86_64.rpm
> [root at father gluster]# ls -laR  /root/gluster/
> /root/gluster/:
> итого 122153436
> drwxr-xr-x   4 qemu qemu        4096 июл 11 13:36 .
> dr-xr-x---. 10 root root        4096 июл 11 12:26 ..
> -rw-r--r--   1 root root  8589934592 май 11 09:14 csr1000v1.img
> -rw-r--r--   1 root root       52268 июл 13 10:36
> gstatus-0.64-3.el7.x86_64.rpm
>
>
> I.e. if I do some i.o. on file then it is back.
>
>
> By the way the same problem if I delete file directly in brick:
>
> [root at father gluster]# rm /wall/pool/brick/gstatus-0.64-3.el7.x86_64.rpm
> rm: удалить обычный файл «/wall/pool/brick/gstatus-0.64-3.el7.x86_64.rpm»?
> y
> [root at father gluster]# ls -laR  /root/gluster/
> /root/gluster/:
> итого 122153384
> drwxr-xr-x   4 qemu qemu        4096 июл 13 10:38 .
> dr-xr-x---. 10 root root        4096 июл 11 12:26 ..
> -rw-r--r--   1 root root  8589934592 май 11 09:14 csr1000v1.img
> -rw-r--r--   1 qemu qemu 43692064768 июл 13 10:38 infimonitor.img
>
>
> I don't see it in directory in fuse mount at all till touch, which
> restores file too.
>
>
> If you need any performance improvements here, we will be happy to help.
> Please give us feedback.
>
>
> You recipe doesn't work :-(  If there is difference between bricks
> directories due to direct brick manipulation it leads to problems.
>

You have to execute "gluster volume heal <volname> full" for triggering
full heal.


>
>
> All I was saying is it is not possible to detect them through index heal.
> Because for the index to be populated you need the operations to go through
> gluster stack.
>
> Why it can't ? I don't know, you just said it is impossible in gluster
>> because it can only track changes only made through gluster, i.e. bricks
>> can have different files sets and it is not recognized (true) because , as
>> I understand, gluster's  self-heal thinks that brick underlying filesystem
>> can't be corrupted by server admin  (not true, I can say this as almost 25
>> years experienced engineer, i.e. I did this several times ;-) ).
>>
>>
>>
>>
>>
>>>
>>> Thank you!
>>>
>>> p.s.
>>> still can't understand why it can't be implemented in gluster... :-(
>>>
>>>
>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pranith
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Pranith
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Pranith
>>>
>>>
>>>
>>
>>
>> --
>> Pranith
>>
>>
>>
>
>
> --
> Pranith
>
>
>


-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160713/0dae5444/attachment.html>


More information about the Gluster-users mailing list