[Gluster-users] Pending heal status when deleting files which are marked as to be healed

Mon Jun 24 13:45:29 UTC 2019

Additional information,

After the volume was 100% full, I delete some of the files but not the
files which are listed in heal info. When it was 98%, I delete the folder
which was marked as to be healed: /archive1/data/fff

After start and stop the volume the files in /archive1/data/fff were still
there.

Regards
David Spisla

Am Mo., 24. Juni 2019 um 15:33 Uhr schrieb David Spisla <spisla80 at gmail.com
>:

> Hello Ravi and Gluster Community,
>
> Am Mo., 24. Juni 2019 um 14:25 Uhr schrieb David Spisla <
> spisla80 at gmail.com>:
>
>>
>>
>> ---------- Forwarded message ---------
>> Von: David Spisla <spisla80 at gmail.com>
>> Date: Fr., 21. Juni 2019 um 10:02 Uhr
>> Subject: Re: [Gluster-users] Pending heal status when deleting files
>> which are marked as to be healed
>> To: Ravishankar N <ravishankar at redhat.com>
>>
>>
>> Hello Ravi,
>>
>> Am Mi., 19. Juni 2019 um 18:06 Uhr schrieb Ravishankar N <
>> ravishankar at redhat.com>:
>>
>>>
>>> On 17/06/19 3:45 PM, David Spisla wrote:
>>>
>>> Hello Gluster Community,
>>>
>>> my newest observation concerns the self heal daemon:
>>> Scenario: 2 Node Gluster v5.5 Cluster with Replica 2 Volume. Just one
>>> brick per node. Access via SMB Client from a Win10 machine
>>>
>>> How to reproduce:
>>> I have created a small folder with a lot of small files and I copied
>>> that folder recursively into itself for a few times. Additionally I copied
>>> three big folders with a lot of content into the root of the volume.
>>> Note: There was no node down or something else like brick down, etc.. So
>>> the whole volume was accessible.
>>>
>>> Because of the recursively copy action all this copied files whre listed
>>> as to be healed (via gluster heal info).
>>>
>>> This is odd. How did you conclude that writing to the volume (i.e.
>>> recursive copy) was the reason for the files to be needing heal? Did you
>>> check if there were any gluster messages about disconnects in the smb
>>> client logs?
>>>
>> There was no disconnection, I am sure. But at all I am not really sure
>> whats the cause of this problem.
>>
> I reproduce it. Now I don't think that recursive copy is the reason. I
> copied several small files in the volume (capacity 1GB) unless it is full
> (see steps to reproduce below). I didn't set RO to the file. There was
> never a disconnection.
>
>>
>>> Now I set some of the effected files ReadOnly (they get WORMed because
>>> worm-file-level is enabled). After this I tried to delete the parent folder
>>> of that files.
>>>
>>> Expected: All files should be healed
>>> Actually: All files, which are Read-Only, are not healed. heal info
>>> shows permanently that this files has to be healed.
>>>
>>> Does disabling read-only let the files to be healed?
>>>
>> I have to ty this.
>>
> I tried it out and it had no efffect.
>
>>
>>> glustershd log throws error and brick log (with level DEBUG) permanently
>>> throws a lot of messages which I don't understand. See the attached file
>>> which contains all informations, also heal info and volume info, beside the
>>> logs
>>>
>>> Maybe some of you know whats going on there? Since we can reproduce this
>>> scenario, we can give more debug information if needed.
>>>
>>> Is it possible to  script the list of steps to reproduce this issue?
>>>
>> I will do that and post it here. Although I will collect more data when
>> it happens
>>
> Steps to reproduce:
>
> 1. Copy several small files into a volume (here: 1GB capacity)
> 2. Copy until the volume is nearly full (70-80% or more)
> 3. Now self-heal is listing files to be healed
> 4. Move or delete all of this files or a just a part.
> 5. The files won't be healed and stay in the heal info list.
>
> In my case I copied until the volume was 100% full (storage.reserve was
> 1%). I delete some of the files, to get a level of 98%. I wait for a while
> but nothing happens. After this I stopped and started the volume. Files are
> now healed.
> Attached there is the glustershd.log where you can see that performing
> entry.self-heal (2019-06-24 10:04:02.007328) could not be finished for
> pgfid:7e4fa649-434a-4bb7-a1c2-258818d76076 until the volume was stopped and
> started again. After starting again entry.self-heal could be finished for
> that pgfid (at 2019-06-24 12:38:38.689632). The pgfid refers to the files
> which were listed to be healed:
>
> fs-davids-c2-n1:~ # gluster vo heal archive1 info
> Brick fs-davids-c2-n1:/gluster/brick1/glusterbrick
> /archive1/data/fff/gg - Kopie.txt
> /archive1/data/fff
> /archive1/data/fff/gg - Kopie - Kopie.txt
> /archive1/data/fff/gg - Kopie - Kopie (2).txt
> Status: Connected
> Number of entries: 4
>
> All of this files has the same pgfid:
>
> fs-davids-c2-n1:~ # getfattr -e hex -d -m ""
> '/gluster/brick1/glusterbrick/archive1/data/fff/'* | grep trusted.pgfid
> getfattr: Removing leading '/' from absolute path names
> trusted.pgfid.7e4fa649-434a-4bb7-a1c2-258818d76076=0x00000001
> trusted.pgfid.7e4fa649-434a-4bb7-a1c2-258818d76076=0x00000001
> trusted.pgfid.7e4fa649-434a-4bb7-a1c2-258818d76076=0x00000001
>
> Summary: The pending heal problem seems to occur if a volume is nearly
> full or completely full.
>
> Regards
> David Spisla
>
>
>> Regards
>> David
>>
>>> Regards,
>>>
>>> Ravi
>>>
>>>
>>> Regards
>>> David Spisla
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190624/e8480ebb/attachment.html>