[Gluster-users] 3.7.13, index healing broken?

Dmitry Melekhov dm at belkam.com
Wed Jul 13 06:41:47 UTC 2016


13.07.2016 10:24, Pranith Kumar Karampuri пишет:
>
>
> On Wed, Jul 13, 2016 at 11:49 AM, Dmitry Melekhov <dm at belkam.com 
> <mailto:dm at belkam.com>> wrote:
>
>     13.07.2016 10:10, Pranith Kumar Karampuri пишет:
>>
>>
>>     On Wed, Jul 13, 2016 at 11:27 AM, Dmitry Melekhov <dm at belkam.com
>>     <mailto:dm at belkam.com>> wrote:
>>
>>         13.07.2016 09:50, Pranith Kumar Karampuri пишет:
>>>
>>>
>>>         On Wed, Jul 13, 2016 at 11:11 AM, Dmitry Melekhov
>>>         <dm at belkam.com <mailto:dm at belkam.com>> wrote:
>>>
>>>             13.07.2016 09:36, Pranith Kumar Karampuri пишет:
>>>>
>>>>
>>>>             On Wed, Jul 13, 2016 at 10:58 AM, Dmitry Melekhov
>>>>             <dm at belkam.com <mailto:dm at belkam.com>> wrote:
>>>>
>>>>                 13.07.2016 09:26, Pranith Kumar Karampuri пишет:
>>>>>
>>>>>
>>>>>                 On Wed, Jul 13, 2016 at 10:50 AM, Dmitry Melekhov
>>>>>                 <dm at belkam.com <mailto:dm at belkam.com>> wrote:
>>>>>
>>>>>                     13.07.2016 09:16, Pranith Kumar Karampuri пишет:
>>>>>>
>>>>>>
>>>>>>                     On Wed, Jul 13, 2016 at 10:38 AM, Dmitry
>>>>>>                     Melekhov <dm at belkam.com
>>>>>>                     <mailto:dm at belkam.com>> wrote:
>>>>>>
>>>>>>                         13.07.2016 09:04, Pranith Kumar Karampuri
>>>>>>                         пишет:
>>>>>>>
>>>>>>>
>>>>>>>                         On Wed, Jul 13, 2016 at 10:29 AM, Dmitry
>>>>>>>                         Melekhov <dm at belkam.com
>>>>>>>                         <mailto:dm at belkam.com>> wrote:
>>>>>>>
>>>>>>>                             13.07.2016 08:56, Pranith Kumar
>>>>>>>                             Karampuri пишет:
>>>>>>>>
>>>>>>>>
>>>>>>>>                             On Wed, Jul 13, 2016 at 10:23 AM,
>>>>>>>>                             Dmitry Melekhov <dm at belkam.com
>>>>>>>>                             <mailto:dm at belkam.com>> wrote:
>>>>>>>>
>>>>>>>>                                 13.07.2016 08:46, Pranith Kumar
>>>>>>>>                                 Karampuri пишет:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                                 On Wed, Jul 13, 2016 at 10:10
>>>>>>>>>                                 AM, Dmitry Melekhov
>>>>>>>>>                                 <dm at belkam.com
>>>>>>>>>                                 <mailto:dm at belkam.com>> wrote:
>>>>>>>>>
>>>>>>>>>                                     13.07.2016 08:36, Pranith
>>>>>>>>>                                     Kumar Karampuri пишет:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                                     On Wed, Jul 13, 2016 at
>>>>>>>>>>                                     9:35 AM, Dmitry Melekhov
>>>>>>>>>>                                     <dm at belkam.com
>>>>>>>>>>                                     <mailto:dm at belkam.com>>
>>>>>>>>>>                                     wrote:
>>>>>>>>>>
>>>>>>>>>>                                         13.07.2016 01:52,
>>>>>>>>>>                                         Anuradha Talur пишет:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                                             ----- Original
>>>>>>>>>>                                             Message -----
>>>>>>>>>>
>>>>>>>>>>                                                 From: "Dmitry
>>>>>>>>>>                                                 Melekhov"
>>>>>>>>>>                                                 <dm at belkam.com <mailto:dm at belkam.com>>
>>>>>>>>>>                                                 To: "Pranith
>>>>>>>>>>                                                 Kumar
>>>>>>>>>>                                                 Karampuri"
>>>>>>>>>>                                                 <pkarampu at redhat.com
>>>>>>>>>>                                                 <mailto:pkarampu at redhat.com>>
>>>>>>>>>>                                                 Cc:
>>>>>>>>>>                                                 "gluster-users"
>>>>>>>>>>                                                 <gluster-users at gluster.org
>>>>>>>>>>                                                 <mailto:gluster-users at gluster.org>>
>>>>>>>>>>                                                 Sent:
>>>>>>>>>>                                                 Tuesday, July
>>>>>>>>>>                                                 12, 2016
>>>>>>>>>>                                                 9:27:17 PM
>>>>>>>>>>                                                 Subject: Re:
>>>>>>>>>>                                                 [Gluster-users]
>>>>>>>>>>                                                 3.7.13, index
>>>>>>>>>>                                                 healing broken?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                                                 12.07.2016
>>>>>>>>>>                                                 17:39,
>>>>>>>>>>                                                 Pranith Kumar
>>>>>>>>>>                                                 Karampuri пишет:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                                                 Wow, what are
>>>>>>>>>>                                                 the steps to
>>>>>>>>>>                                                 recreate the
>>>>>>>>>>                                                 problem?
>>>>>>>>>>
>>>>>>>>>>                                                 just set file
>>>>>>>>>>                                                 length to
>>>>>>>>>>                                                 zero, always
>>>>>>>>>>                                                 reproducible.
>>>>>>>>>>
>>>>>>>>>>                                             If you are
>>>>>>>>>>                                             setting the file
>>>>>>>>>>                                             length to 0 on
>>>>>>>>>>                                             one of the bricks
>>>>>>>>>>                                             (looks like
>>>>>>>>>>                                             that is the
>>>>>>>>>>                                             case), it is not
>>>>>>>>>>                                             a bug.
>>>>>>>>>>
>>>>>>>>>>                                             Index heal relies
>>>>>>>>>>                                             on failures seen
>>>>>>>>>>                                             from the mount
>>>>>>>>>>                                             point(s)
>>>>>>>>>>                                             to identify the
>>>>>>>>>>                                             files that need
>>>>>>>>>>                                             heal. It won't be
>>>>>>>>>>                                             able to recognize
>>>>>>>>>>                                             any file
>>>>>>>>>>                                             modification done
>>>>>>>>>>                                             directly on
>>>>>>>>>>                                             bricks. Same goes
>>>>>>>>>>                                             for heal info
>>>>>>>>>>                                             command which
>>>>>>>>>>                                             is the reason
>>>>>>>>>>                                             heal info also
>>>>>>>>>>                                             shows 0 entries.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                                         Well, this makes
>>>>>>>>>>                                         self-heal useless
>>>>>>>>>>                                         then- if any file is
>>>>>>>>>>                                         accidently corrupted
>>>>>>>>>>                                         or deleted (yes! if
>>>>>>>>>>                                         file is deleted
>>>>>>>>>>                                         directly from brick
>>>>>>>>>>                                         this is no recognized
>>>>>>>>>>                                         by idex heal too),
>>>>>>>>>>                                         then it will not be
>>>>>>>>>>                                         self-healed, because
>>>>>>>>>>                                         self-heal uses index
>>>>>>>>>>                                         heal.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                                     It is better to look into
>>>>>>>>>>                                     bit-rot feature if you
>>>>>>>>>>                                     want to guard against
>>>>>>>>>>                                     these kinds of problems.
>>>>>>>>>
>>>>>>>>>                                     Bit rot detects bit
>>>>>>>>>                                     problems, not missing
>>>>>>>>>                                     files or their wrong
>>>>>>>>>                                     length, i.e. this is
>>>>>>>>>                                     overhead for such simple task.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                                 It detects wrong length.
>>>>>>>>>                                 Because checksum won't match
>>>>>>>>>                                 anymore.
>>>>>>>>
>>>>>>>>                                 Yes, sure. I guess that it will
>>>>>>>>                                 detect missed files too. But it
>>>>>>>>                                 needs far more resources, then
>>>>>>>>                                 just comparing directories in
>>>>>>>>                                 bricks?
>>>>>>>>>
>>>>>>>>>                                 What use-case you are trying
>>>>>>>>>                                 out is leading to changing
>>>>>>>>>                                 things directly on the brick?
>>>>>>>>                                 I'm trying to test gluster
>>>>>>>>                                 failure tolerance and right now
>>>>>>>>                                 I'm not happy with it...
>>>>>>>>
>>>>>>>>
>>>>>>>>                             Which cases of fault tolerance are
>>>>>>>>                             you not happy with? Making changes
>>>>>>>>                             directly on the brick or anything
>>>>>>>>                             else as well?
>>>>>>>>
>>>>>>>                             I'll repeat:
>>>>>>>                             As I already said- if I for some
>>>>>>>                             reason ( real case  can be only by
>>>>>>>                             accident ) will delete file this
>>>>>>>                             will not be detected by self-heal
>>>>>>>                             daemon, and, thus, will lead to
>>>>>>>                             lower replication level, i.e. lower
>>>>>>>                             failure tolerance.
>>>>>>>
>>>>>>>
>>>>>>>                         To prevent such accidents you need to
>>>>>>>                         set selinux policies so that files under
>>>>>>>                         the brick are not modified by accident
>>>>>>>                         by any user. At least that is the
>>>>>>>                         solution I remember when this was
>>>>>>>                         discussed 3-4 years back.
>>>>>>>
>>>>>>                         So only supported platfrom is linux? Or,
>>>>>>                         may be, it is better to improve
>>>>>>                         self-healing to detect missing or wrong
>>>>>>                         length files, I guess this is very low
>>>>>>                         cost in terms of host resources operation.
>>>>>>                         Just a suggestion, may be we need to look
>>>>>>                         to alternatives in near future....
>>>>>>
>>>>>>                     This is a corner case, from design
>>>>>>                     perspective it is generally not a good idea
>>>>>>                     to optimize for the corner case. It is better
>>>>>>                     to protect ourselves from the corner case
>>>>>>                     (SElinux etc) or you can also use snapshots
>>>>>>                     to protect against these kind of mishaps.
>>>>>>
>>>>>                     Sorry, I'm not agree.
>>>>>                     As you  know if on access missed or wrong
>>>>>                     lenghted file from fuse client it is restored
>>>>>                     (healed), i.e. gluster recognizes file is
>>>>>                     wrong and heal it , so I do not see any reason
>>>>>                     to provide this such function as self-healing.
>>>>>                     Thank you!
>>>>>
>>>>>                 Ah! Now how do you suggest we keep track of which
>>>>>                 of 10s of millions of files the user accidentally
>>>>>                 deleted from the brick without gluster's
>>>>>                 knowledge? Once it comes to gluster's knowledge we
>>>>>                 can do something. But how does gluster become
>>>>>                 aware of something it is not keeping track of? At
>>>>>                 the time you access it gluster knows something
>>>>>                 went wrong so it restores it. If you change
>>>>>                 something on the bricks even by accident all the
>>>>>                 data gluster keeps (similar to journal) is a
>>>>>                 waste. Even the disk filesystems will ask you to
>>>>>                 do fsck if something unexpected happens so full
>>>>>                 self-heal is similar operation.
>>>>
>>>>                 You are absolutely right- question is why gluster
>>>>                 does not become aware about such problem is case of
>>>>                 self-healing?
>>>>
>>>>
>>>>             Because the operations that are performed directly on
>>>>             brick do not go through gluster stack.
>>>
>>>             OK, I'll repeat-
>>>             As you  know if on access missed or wrong lenghted file
>>>             from fuse client it is restored (healed), i.e. gluster
>>>             recognizes file is wrong and heal it , so I do not see
>>>             any reason to provide this such function as self-healing.
>>>
>>>
>>>         For which you need accessing the file.
>>         That's right.
>>>         For which you need full crawl. You can't detect the
>>>         modification which doesn't go through the stack so this is
>>>         the only possibility.
>>
>>         OK, then, if self-heal is really useless and no possible way
>>         to get it will be provided, I guess we'll use external script
>>         to check bricks directories consistency,
>>         don't think ls and diff will get much resources.
>>
>>
>>     How is this different from full self-heal?
>
>     Self-heal does not detect deleted or wrong-length files .
>
>
> It detects when you do full crawl. Which essentially is ls -laR kind 
> of thing on the whole volume. You don't need any external scripts, 
> keep doing full crawl once in a while may be?

You mean on fuse mount?

It doesn't work:

[root at father ~]# mount -t glusterfs localhost:/pool gluster

[root at father ~]#

then make it zero lengths in brick:

[root at father gluster]# > /wall/pool/brick/gstatus-0.64-3.el7.x86_64.rpm
[root at father gluster]#


[root at father gluster]# ls -laR  /root/gluster/
/root/gluster/:
итого 122153384
drwxr-xr-x   4 qemu qemu        4096 июл 11 13:36 .
dr-xr-x---. 10 root root        4096 июл 11 12:26 ..
-rw-r--r--   1 root root  8589934592 май 11 09:14 csr1000v1.img
-rw-r--r--   1 root root           0 июл 13 10:34 
gstatus-0.64-3.el7.x86_64.rpm


As you can see gstatus-0.64-3.el7.x86_64.rpm has 0 length
But:

[root at father gluster]# touch /root/gluster/gstatus-0.64-3.el7.x86_64.rpm
[root at father gluster]# ls -laR  /root/gluster/
/root/gluster/:
итого 122153436
drwxr-xr-x   4 qemu qemu        4096 июл 11 13:36 .
dr-xr-x---. 10 root root        4096 июл 11 12:26 ..
-rw-r--r--   1 root root  8589934592 май 11 09:14 csr1000v1.img
-rw-r--r--   1 root root       52268 июл 13 10:36 
gstatus-0.64-3.el7.x86_64.rpm


I.e. if I do some i.o. on file then it is back.


By the way the same problem if I delete file directly in brick:

[root at father gluster]# rm /wall/pool/brick/gstatus-0.64-3.el7.x86_64.rpm
rm: удалить обычный файл «/wall/pool/brick/gstatus-0.64-3.el7.x86_64.rpm»? y
[root at father gluster]# ls -laR  /root/gluster/
/root/gluster/:
итого 122153384
drwxr-xr-x   4 qemu qemu        4096 июл 13 10:38 .
dr-xr-x---. 10 root root        4096 июл 11 12:26 ..
-rw-r--r--   1 root root  8589934592 май 11 09:14 csr1000v1.img
-rw-r--r--   1 qemu qemu 43692064768 июл 13 10:38 infimonitor.img


I don't see it in directory in fuse mount at all till touch, which 
restores file too.


> If you need any performance improvements here, we will be happy to 
> help. Please give us feedback.

You recipe doesn't work :-(  If there is difference between bricks 
directories due to direct brick manipulation it leads to problems.


> All I was saying is it is not possible to detect them through index 
> heal. Because for the index to be populated you need the operations to 
> go through gluster stack.
>
>     Why it can't ? I don't know, you just said it is impossible in
>     gluster because it can only track changes only made through
>     gluster, i.e. bricks can have different files sets and it is not
>     recognized (true) because , as I understand, gluster's  self-heal
>     thinks that brick underlying filesystem can't be corrupted by
>     server admin  (not true, I can say this as almost 25 years
>     experienced engineer, i.e. I did this several times ;-) ).
>
>
>
>>
>>         Thank you!
>>
>>         p.s.
>>         still can't understand why it can't be implemented in
>>         gluster... :-(
>>
>>>
>>>>
>>>>>
>>>>>
>>>>>                 -- 
>>>>>                 Pranith
>>>>
>>>>
>>>>
>>>>
>>>>             -- 
>>>>             Pranith
>>>
>>>
>>>
>>>
>>>         -- 
>>>         Pranith
>>
>>
>>
>>
>>     -- 
>>     Pranith
>
>
>
>
> -- 
> Pranith

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160713/d73cd562/attachment-0001.html>


More information about the Gluster-users mailing list