[Gluster-users] Pending heal status when deleting files which are marked as to be healed

Mon Jun 24 13:33:45 UTC 2019

Hello Ravi and Gluster Community,

Am Mo., 24. Juni 2019 um 14:25 Uhr schrieb David Spisla <spisla80 at gmail.com
>:

>
>
> ---------- Forwarded message ---------
> Von: David Spisla <spisla80 at gmail.com>
> Date: Fr., 21. Juni 2019 um 10:02 Uhr
> Subject: Re: [Gluster-users] Pending heal status when deleting files which
> are marked as to be healed
> To: Ravishankar N <ravishankar at redhat.com>
>
>
> Hello Ravi,
>
> Am Mi., 19. Juni 2019 um 18:06 Uhr schrieb Ravishankar N <
> ravishankar at redhat.com>:
>
>>
>> On 17/06/19 3:45 PM, David Spisla wrote:
>>
>> Hello Gluster Community,
>>
>> my newest observation concerns the self heal daemon:
>> Scenario: 2 Node Gluster v5.5 Cluster with Replica 2 Volume. Just one
>> brick per node. Access via SMB Client from a Win10 machine
>>
>> How to reproduce:
>> I have created a small folder with a lot of small files and I copied that
>> folder recursively into itself for a few times. Additionally I copied three
>> big folders with a lot of content into the root of the volume.
>> Note: There was no node down or something else like brick down, etc.. So
>> the whole volume was accessible.
>>
>> Because of the recursively copy action all this copied files whre listed
>> as to be healed (via gluster heal info).
>>
>> This is odd. How did you conclude that writing to the volume (i.e.
>> recursive copy) was the reason for the files to be needing heal? Did you
>> check if there were any gluster messages about disconnects in the smb
>> client logs?
>>
> There was no disconnection, I am sure. But at all I am not really sure
> whats the cause of this problem.
>
I reproduce it. Now I don't think that recursive copy is the reason. I
copied several small files in the volume (capacity 1GB) unless it is full
(see steps to reproduce below). I didn't set RO to the file. There was
never a disconnection.

>
>> Now I set some of the effected files ReadOnly (they get WORMed because
>> worm-file-level is enabled). After this I tried to delete the parent folder
>> of that files.
>>
>> Expected: All files should be healed
>> Actually: All files, which are Read-Only, are not healed. heal info shows
>> permanently that this files has to be healed.
>>
>> Does disabling read-only let the files to be healed?
>>
> I have to ty this.
>
I tried it out and it had no efffect.

>
>> glustershd log throws error and brick log (with level DEBUG) permanently
>> throws a lot of messages which I don't understand. See the attached file
>> which contains all informations, also heal info and volume info, beside the
>> logs
>>
>> Maybe some of you know whats going on there? Since we can reproduce this
>> scenario, we can give more debug information if needed.
>>
>> Is it possible to  script the list of steps to reproduce this issue?
>>
> I will do that and post it here. Although I will collect more data when it
> happens
>
Steps to reproduce:

1. Copy several small files into a volume (here: 1GB capacity)
2. Copy until the volume is nearly full (70-80% or more)
3. Now self-heal is listing files to be healed
4. Move or delete all of this files or a just a part.
5. The files won't be healed and stay in the heal info list.

In my case I copied until the volume was 100% full (storage.reserve was
1%). I delete some of the files, to get a level of 98%. I wait for a while
but nothing happens. After this I stopped and started the volume. Files are
now healed.
Attached there is the glustershd.log where you can see that performing
entry.self-heal (2019-06-24 10:04:02.007328) could not be finished for
pgfid:7e4fa649-434a-4bb7-a1c2-258818d76076 until the volume was stopped and
started again. After starting again entry.self-heal could be finished for
that pgfid (at 2019-06-24 12:38:38.689632). The pgfid refers to the files
which were listed to be healed:

fs-davids-c2-n1:~ # gluster vo heal archive1 info
Brick fs-davids-c2-n1:/gluster/brick1/glusterbrick
/archive1/data/fff/gg - Kopie.txt
/archive1/data/fff
/archive1/data/fff/gg - Kopie - Kopie.txt
/archive1/data/fff/gg - Kopie - Kopie (2).txt
Status: Connected
Number of entries: 4

All of this files has the same pgfid:

fs-davids-c2-n1:~ # getfattr -e hex -d -m ""
'/gluster/brick1/glusterbrick/archive1/data/fff/'* | grep trusted.pgfid
getfattr: Removing leading '/' from absolute path names
trusted.pgfid.7e4fa649-434a-4bb7-a1c2-258818d76076=0x00000001
trusted.pgfid.7e4fa649-434a-4bb7-a1c2-258818d76076=0x00000001
trusted.pgfid.7e4fa649-434a-4bb7-a1c2-258818d76076=0x00000001

Summary: The pending heal problem seems to occur if a volume is nearly full
or completely full.

Regards
David Spisla

> Regards
> David
>
>> Regards,
>>
>> Ravi
>>
>>
>> Regards
>> David Spisla
>>
>>
>> _______________________________________________
>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190624/3040e24a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glustershd.log
Type: application/octet-stream
Size: 51236 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190624/3040e24a/attachment.obj>