[Gluster-users] 3.7.13, index healing broken?
Dmitry Melekhov
dm at belkam.com
Wed Jul 13 06:41:47 UTC 2016
13.07.2016 10:24, Pranith Kumar Karampuri пишет:
>
>
> On Wed, Jul 13, 2016 at 11:49 AM, Dmitry Melekhov <dm at belkam.com
> <mailto:dm at belkam.com>> wrote:
>
> 13.07.2016 10:10, Pranith Kumar Karampuri пишет:
>>
>>
>> On Wed, Jul 13, 2016 at 11:27 AM, Dmitry Melekhov <dm at belkam.com
>> <mailto:dm at belkam.com>> wrote:
>>
>> 13.07.2016 09:50, Pranith Kumar Karampuri пишет:
>>>
>>>
>>> On Wed, Jul 13, 2016 at 11:11 AM, Dmitry Melekhov
>>> <dm at belkam.com <mailto:dm at belkam.com>> wrote:
>>>
>>> 13.07.2016 09:36, Pranith Kumar Karampuri пишет:
>>>>
>>>>
>>>> On Wed, Jul 13, 2016 at 10:58 AM, Dmitry Melekhov
>>>> <dm at belkam.com <mailto:dm at belkam.com>> wrote:
>>>>
>>>> 13.07.2016 09:26, Pranith Kumar Karampuri пишет:
>>>>>
>>>>>
>>>>> On Wed, Jul 13, 2016 at 10:50 AM, Dmitry Melekhov
>>>>> <dm at belkam.com <mailto:dm at belkam.com>> wrote:
>>>>>
>>>>> 13.07.2016 09:16, Pranith Kumar Karampuri пишет:
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 13, 2016 at 10:38 AM, Dmitry
>>>>>> Melekhov <dm at belkam.com
>>>>>> <mailto:dm at belkam.com>> wrote:
>>>>>>
>>>>>> 13.07.2016 09:04, Pranith Kumar Karampuri
>>>>>> пишет:
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 13, 2016 at 10:29 AM, Dmitry
>>>>>>> Melekhov <dm at belkam.com
>>>>>>> <mailto:dm at belkam.com>> wrote:
>>>>>>>
>>>>>>> 13.07.2016 08:56, Pranith Kumar
>>>>>>> Karampuri пишет:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 13, 2016 at 10:23 AM,
>>>>>>>> Dmitry Melekhov <dm at belkam.com
>>>>>>>> <mailto:dm at belkam.com>> wrote:
>>>>>>>>
>>>>>>>> 13.07.2016 08:46, Pranith Kumar
>>>>>>>> Karampuri пишет:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jul 13, 2016 at 10:10
>>>>>>>>> AM, Dmitry Melekhov
>>>>>>>>> <dm at belkam.com
>>>>>>>>> <mailto:dm at belkam.com>> wrote:
>>>>>>>>>
>>>>>>>>> 13.07.2016 08:36, Pranith
>>>>>>>>> Kumar Karampuri пишет:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 13, 2016 at
>>>>>>>>>> 9:35 AM, Dmitry Melekhov
>>>>>>>>>> <dm at belkam.com
>>>>>>>>>> <mailto:dm at belkam.com>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> 13.07.2016 01:52,
>>>>>>>>>> Anuradha Talur пишет:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ----- Original
>>>>>>>>>> Message -----
>>>>>>>>>>
>>>>>>>>>> From: "Dmitry
>>>>>>>>>> Melekhov"
>>>>>>>>>> <dm at belkam.com <mailto:dm at belkam.com>>
>>>>>>>>>> To: "Pranith
>>>>>>>>>> Kumar
>>>>>>>>>> Karampuri"
>>>>>>>>>> <pkarampu at redhat.com
>>>>>>>>>> <mailto:pkarampu at redhat.com>>
>>>>>>>>>> Cc:
>>>>>>>>>> "gluster-users"
>>>>>>>>>> <gluster-users at gluster.org
>>>>>>>>>> <mailto:gluster-users at gluster.org>>
>>>>>>>>>> Sent:
>>>>>>>>>> Tuesday, July
>>>>>>>>>> 12, 2016
>>>>>>>>>> 9:27:17 PM
>>>>>>>>>> Subject: Re:
>>>>>>>>>> [Gluster-users]
>>>>>>>>>> 3.7.13, index
>>>>>>>>>> healing broken?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 12.07.2016
>>>>>>>>>> 17:39,
>>>>>>>>>> Pranith Kumar
>>>>>>>>>> Karampuri пишет:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Wow, what are
>>>>>>>>>> the steps to
>>>>>>>>>> recreate the
>>>>>>>>>> problem?
>>>>>>>>>>
>>>>>>>>>> just set file
>>>>>>>>>> length to
>>>>>>>>>> zero, always
>>>>>>>>>> reproducible.
>>>>>>>>>>
>>>>>>>>>> If you are
>>>>>>>>>> setting the file
>>>>>>>>>> length to 0 on
>>>>>>>>>> one of the bricks
>>>>>>>>>> (looks like
>>>>>>>>>> that is the
>>>>>>>>>> case), it is not
>>>>>>>>>> a bug.
>>>>>>>>>>
>>>>>>>>>> Index heal relies
>>>>>>>>>> on failures seen
>>>>>>>>>> from the mount
>>>>>>>>>> point(s)
>>>>>>>>>> to identify the
>>>>>>>>>> files that need
>>>>>>>>>> heal. It won't be
>>>>>>>>>> able to recognize
>>>>>>>>>> any file
>>>>>>>>>> modification done
>>>>>>>>>> directly on
>>>>>>>>>> bricks. Same goes
>>>>>>>>>> for heal info
>>>>>>>>>> command which
>>>>>>>>>> is the reason
>>>>>>>>>> heal info also
>>>>>>>>>> shows 0 entries.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Well, this makes
>>>>>>>>>> self-heal useless
>>>>>>>>>> then- if any file is
>>>>>>>>>> accidently corrupted
>>>>>>>>>> or deleted (yes! if
>>>>>>>>>> file is deleted
>>>>>>>>>> directly from brick
>>>>>>>>>> this is no recognized
>>>>>>>>>> by idex heal too),
>>>>>>>>>> then it will not be
>>>>>>>>>> self-healed, because
>>>>>>>>>> self-heal uses index
>>>>>>>>>> heal.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It is better to look into
>>>>>>>>>> bit-rot feature if you
>>>>>>>>>> want to guard against
>>>>>>>>>> these kinds of problems.
>>>>>>>>>
>>>>>>>>> Bit rot detects bit
>>>>>>>>> problems, not missing
>>>>>>>>> files or their wrong
>>>>>>>>> length, i.e. this is
>>>>>>>>> overhead for such simple task.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It detects wrong length.
>>>>>>>>> Because checksum won't match
>>>>>>>>> anymore.
>>>>>>>>
>>>>>>>> Yes, sure. I guess that it will
>>>>>>>> detect missed files too. But it
>>>>>>>> needs far more resources, then
>>>>>>>> just comparing directories in
>>>>>>>> bricks?
>>>>>>>>>
>>>>>>>>> What use-case you are trying
>>>>>>>>> out is leading to changing
>>>>>>>>> things directly on the brick?
>>>>>>>> I'm trying to test gluster
>>>>>>>> failure tolerance and right now
>>>>>>>> I'm not happy with it...
>>>>>>>>
>>>>>>>>
>>>>>>>> Which cases of fault tolerance are
>>>>>>>> you not happy with? Making changes
>>>>>>>> directly on the brick or anything
>>>>>>>> else as well?
>>>>>>>>
>>>>>>> I'll repeat:
>>>>>>> As I already said- if I for some
>>>>>>> reason ( real case can be only by
>>>>>>> accident ) will delete file this
>>>>>>> will not be detected by self-heal
>>>>>>> daemon, and, thus, will lead to
>>>>>>> lower replication level, i.e. lower
>>>>>>> failure tolerance.
>>>>>>>
>>>>>>>
>>>>>>> To prevent such accidents you need to
>>>>>>> set selinux policies so that files under
>>>>>>> the brick are not modified by accident
>>>>>>> by any user. At least that is the
>>>>>>> solution I remember when this was
>>>>>>> discussed 3-4 years back.
>>>>>>>
>>>>>> So only supported platfrom is linux? Or,
>>>>>> may be, it is better to improve
>>>>>> self-healing to detect missing or wrong
>>>>>> length files, I guess this is very low
>>>>>> cost in terms of host resources operation.
>>>>>> Just a suggestion, may be we need to look
>>>>>> to alternatives in near future....
>>>>>>
>>>>>> This is a corner case, from design
>>>>>> perspective it is generally not a good idea
>>>>>> to optimize for the corner case. It is better
>>>>>> to protect ourselves from the corner case
>>>>>> (SElinux etc) or you can also use snapshots
>>>>>> to protect against these kind of mishaps.
>>>>>>
>>>>> Sorry, I'm not agree.
>>>>> As you know if on access missed or wrong
>>>>> lenghted file from fuse client it is restored
>>>>> (healed), i.e. gluster recognizes file is
>>>>> wrong and heal it , so I do not see any reason
>>>>> to provide this such function as self-healing.
>>>>> Thank you!
>>>>>
>>>>> Ah! Now how do you suggest we keep track of which
>>>>> of 10s of millions of files the user accidentally
>>>>> deleted from the brick without gluster's
>>>>> knowledge? Once it comes to gluster's knowledge we
>>>>> can do something. But how does gluster become
>>>>> aware of something it is not keeping track of? At
>>>>> the time you access it gluster knows something
>>>>> went wrong so it restores it. If you change
>>>>> something on the bricks even by accident all the
>>>>> data gluster keeps (similar to journal) is a
>>>>> waste. Even the disk filesystems will ask you to
>>>>> do fsck if something unexpected happens so full
>>>>> self-heal is similar operation.
>>>>
>>>> You are absolutely right- question is why gluster
>>>> does not become aware about such problem is case of
>>>> self-healing?
>>>>
>>>>
>>>> Because the operations that are performed directly on
>>>> brick do not go through gluster stack.
>>>
>>> OK, I'll repeat-
>>> As you know if on access missed or wrong lenghted file
>>> from fuse client it is restored (healed), i.e. gluster
>>> recognizes file is wrong and heal it , so I do not see
>>> any reason to provide this such function as self-healing.
>>>
>>>
>>> For which you need accessing the file.
>> That's right.
>>> For which you need full crawl. You can't detect the
>>> modification which doesn't go through the stack so this is
>>> the only possibility.
>>
>> OK, then, if self-heal is really useless and no possible way
>> to get it will be provided, I guess we'll use external script
>> to check bricks directories consistency,
>> don't think ls and diff will get much resources.
>>
>>
>> How is this different from full self-heal?
>
> Self-heal does not detect deleted or wrong-length files .
>
>
> It detects when you do full crawl. Which essentially is ls -laR kind
> of thing on the whole volume. You don't need any external scripts,
> keep doing full crawl once in a while may be?
You mean on fuse mount?
It doesn't work:
[root at father ~]# mount -t glusterfs localhost:/pool gluster
[root at father ~]#
then make it zero lengths in brick:
[root at father gluster]# > /wall/pool/brick/gstatus-0.64-3.el7.x86_64.rpm
[root at father gluster]#
[root at father gluster]# ls -laR /root/gluster/
/root/gluster/:
итого 122153384
drwxr-xr-x 4 qemu qemu 4096 июл 11 13:36 .
dr-xr-x---. 10 root root 4096 июл 11 12:26 ..
-rw-r--r-- 1 root root 8589934592 май 11 09:14 csr1000v1.img
-rw-r--r-- 1 root root 0 июл 13 10:34
gstatus-0.64-3.el7.x86_64.rpm
As you can see gstatus-0.64-3.el7.x86_64.rpm has 0 length
But:
[root at father gluster]# touch /root/gluster/gstatus-0.64-3.el7.x86_64.rpm
[root at father gluster]# ls -laR /root/gluster/
/root/gluster/:
итого 122153436
drwxr-xr-x 4 qemu qemu 4096 июл 11 13:36 .
dr-xr-x---. 10 root root 4096 июл 11 12:26 ..
-rw-r--r-- 1 root root 8589934592 май 11 09:14 csr1000v1.img
-rw-r--r-- 1 root root 52268 июл 13 10:36
gstatus-0.64-3.el7.x86_64.rpm
I.e. if I do some i.o. on file then it is back.
By the way the same problem if I delete file directly in brick:
[root at father gluster]# rm /wall/pool/brick/gstatus-0.64-3.el7.x86_64.rpm
rm: удалить обычный файл «/wall/pool/brick/gstatus-0.64-3.el7.x86_64.rpm»? y
[root at father gluster]# ls -laR /root/gluster/
/root/gluster/:
итого 122153384
drwxr-xr-x 4 qemu qemu 4096 июл 13 10:38 .
dr-xr-x---. 10 root root 4096 июл 11 12:26 ..
-rw-r--r-- 1 root root 8589934592 май 11 09:14 csr1000v1.img
-rw-r--r-- 1 qemu qemu 43692064768 июл 13 10:38 infimonitor.img
I don't see it in directory in fuse mount at all till touch, which
restores file too.
> If you need any performance improvements here, we will be happy to
> help. Please give us feedback.
You recipe doesn't work :-( If there is difference between bricks
directories due to direct brick manipulation it leads to problems.
> All I was saying is it is not possible to detect them through index
> heal. Because for the index to be populated you need the operations to
> go through gluster stack.
>
> Why it can't ? I don't know, you just said it is impossible in
> gluster because it can only track changes only made through
> gluster, i.e. bricks can have different files sets and it is not
> recognized (true) because , as I understand, gluster's self-heal
> thinks that brick underlying filesystem can't be corrupted by
> server admin (not true, I can say this as almost 25 years
> experienced engineer, i.e. I did this several times ;-) ).
>
>
>
>>
>> Thank you!
>>
>> p.s.
>> still can't understand why it can't be implemented in
>> gluster... :-(
>>
>>>
>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pranith
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Pranith
>>>
>>>
>>>
>>>
>>> --
>>> Pranith
>>
>>
>>
>>
>> --
>> Pranith
>
>
>
>
> --
> Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160713/d73cd562/attachment-0001.html>
More information about the Gluster-users
mailing list