[Gluster-users] self-heal not working

mabi mabi at protonmail.ch
Mon Aug 28 07:59:06 UTC 2017


Excuse me for my naive questions but how do I reset the afr.dirty xattr on the file to be healed? and do I need to do that through a FUSE mount? or simply on every bricks directly?

> -------- Original Message --------
> Subject: Re: [Gluster-users] self-heal not working
> Local Time: August 28, 2017 5:58 AM
> UTC Time: August 28, 2017 3:58 AM
> From: ravishankar at redhat.com
> To: Ben Turner <bturner at redhat.com>, mabi <mabi at protonmail.ch>
> Gluster Users <gluster-users at gluster.org>
>
> On 08/28/2017 01:57 AM, Ben Turner wrote:
>> ----- Original Message -----
>>> From: "mabi" <mabi at protonmail.ch>
>>> To: "Ravishankar N" <ravishankar at redhat.com>
>>> Cc: "Ben Turner" <bturner at redhat.com>, "Gluster Users" <gluster-users at gluster.org>
>>> Sent: Sunday, August 27, 2017 3:15:33 PM
>>> Subject: Re: [Gluster-users] self-heal not working
>>>
>>> Thanks Ravi for your analysis. So as far as I understand nothing to worry
>>> about but my question now would be: how do I get rid of this file from the
>>> heal info?
>> Correct me if I am wrong but clearing this is just a matter of resetting the afr.dirty xattr? @Ravi - Is this correct?
>
> Yes resetting the xattr and launching index heal or running heal-info
> command should serve as a workaround.
> -Ravi
>
>>
>> -b
>>
>>>> -------- Original Message --------
>>>> Subject: Re: [Gluster-users] self-heal not working
>>>> Local Time: August 27, 2017 3:45 PM
>>>> UTC Time: August 27, 2017 1:45 PM
>>>> From: ravishankar at redhat.com
>>>> To: mabi <mabi at protonmail.ch>
>>>> Ben Turner <bturner at redhat.com>, Gluster Users <gluster-users at gluster.org>
>>>>
>>>> Yes, the shds did pick up the file for healing (I saw messages like " got
>>>> entry: 1985e233-d5ee-4e3e-a51a-cf0b5f9f2aea") but no error afterwards.
>>>>
>>>> Anyway I reproduced it by manually setting the afr.dirty bit for a zero
>>>> byte file on all 3 bricks. Since there are no afr pending xattrs
>>>> indicating good/bad copies and all files are zero bytes, the data
>>>> self-heal algorithm just picks the file with the latest ctime as source.
>>>> In your case that was the arbiter brick. In the code, there is a check to
>>>> prevent data heals if arbiter is the source. So heal was not happening and
>>>> the entries were not removed from heal-info output.
>>>>
>>>> Perhaps we should add a check in the code to just remove the entries from
>>>> heal-info if size is zero bytes in all bricks.
>>>>
>>>> -Ravi
>>>>
>>>> On 08/25/2017 06:33 PM, mabi wrote:
>>>>
>>>>> Hi Ravi,
>>>>>
>>>>> Did you get a chance to have a look at the log files I have attached in my
>>>>> last mail?
>>>>>
>>>>> Best,
>>>>> Mabi
>>>>>
>>>>>> -------- Original Message --------
>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>> Local Time: August 24, 2017 12:08 PM
>>>>>> UTC Time: August 24, 2017 10:08 AM
>>>>>> From: mabi at protonmail.ch
>>>>>> To: Ravishankar N
>>>>>> [<ravishankar at redhat.com>](mailto:ravishankar at redhat.com)
>>>>>> Ben Turner [<bturner at redhat.com>](mailto:bturner at redhat.com), Gluster
>>>>>> Users [<gluster-users at gluster.org>](mailto:gluster-users at gluster.org)
>>>>>>
>>>>>> Thanks for confirming the command. I have now enabled DEBUG
>>>>>> client-log-level, run a heal and then attached the glustershd log files
>>>>>> of all 3 nodes in this mail.
>>>>>>
>>>>>> The volume concerned is called myvol-pro, the other 3 volumes have no
>>>>>> problem so far.
>>>>>>
>>>>>> Also note that in the mean time it looks like the file has been deleted
>>>>>> by the user and as such the heal info command does not show the file
>>>>>> name anymore but just is GFID which is:
>>>>>>
>>>>>> gfid:1985e233-d5ee-4e3e-a51a-cf0b5f9f2aea
>>>>>>
>>>>>> Hope that helps for debugging this issue.
>>>>>>
>>>>>>> -------- Original Message --------
>>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>> Local Time: August 24, 2017 5:58 AM
>>>>>>> UTC Time: August 24, 2017 3:58 AM
>>>>>>> From: ravishankar at redhat.com
>>>>>>> To: mabi [<mabi at protonmail.ch>](mailto:mabi at protonmail.ch)
>>>>>>> Ben Turner [<bturner at redhat.com>](mailto:bturner at redhat.com), Gluster
>>>>>>> Users [<gluster-users at gluster.org>](mailto:gluster-users at gluster.org)
>>>>>>>
>>>>>>> Unlikely. In your case only the afr.dirty is set, not the
>>>>>>> afr.volname-client-xx xattr.
>>>>>>>
>>>>>>> `gluster volume set myvolume diagnostics.client-log-level DEBUG` is
>>>>>>> right.
>>>>>>>
>>>>>>> On 08/23/2017 10:31 PM, mabi wrote:
>>>>>>>
>>>>>>>> I just saw the following bug which was fixed in 3.8.15:
>>>>>>>>
>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1471613
>>>>>>>>
>>>>>>>> Is it possible that the problem I described in this post is related to
>>>>>>>> that bug?
>>>>>>>>
>>>>>>>>> -------- Original Message --------
>>>>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>>>> Local Time: August 22, 2017 11:51 AM
>>>>>>>>> UTC Time: August 22, 2017 9:51 AM
>>>>>>>>> From: ravishankar at redhat.com
>>>>>>>>> To: mabi [<mabi at protonmail.ch>](mailto:mabi at protonmail.ch)
>>>>>>>>> Ben Turner [<bturner at redhat.com>](mailto:bturner at redhat.com), Gluster
>>>>>>>>> Users [<gluster-users at gluster.org>](mailto:gluster-users at gluster.org)
>>>>>>>>>
>>>>>>>>> On 08/22/2017 02:30 PM, mabi wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for the additional hints, I have the following 2 questions
>>>>>>>>>> first:
>>>>>>>>>>
>>>>>>>>>> - In order to launch the index heal is the following command correct:
>>>>>>>>>> gluster volume heal myvolume
>>>>>>>>> Yes
>>>>>>>>>
>>>>>>>>>> - If I run a "volume start force" will it have any short disruptions
>>>>>>>>>> on my clients which mount the volume through FUSE? If yes, how long?
>>>>>>>>>> This is a production system that"s why I am asking.
>>>>>>>>> No. You can actually create a test volume on your personal linux box
>>>>>>>>> to try these kinds of things without needing multiple machines. This
>>>>>>>>> is how we develop and test our patches :)
>>>>>>>>> "gluster volume create testvol replica 3 /home/mabi/bricks/brick{1..3}
>>>>>>>>> force` and so on.
>>>>>>>>>
>>>>>>>>> HTH,
>>>>>>>>> Ravi
>>>>>>>>>
>>>>>>>>>>> -------- Original Message --------
>>>>>>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>>>>>> Local Time: August 22, 2017 6:26 AM
>>>>>>>>>>> UTC Time: August 22, 2017 4:26 AM
>>>>>>>>>>> From: ravishankar at redhat.com
>>>>>>>>>>> To: mabi [<mabi at protonmail.ch>](mailto:mabi at protonmail.ch), Ben
>>>>>>>>>>> Turner [<bturner at redhat.com>](mailto:bturner at redhat.com)
>>>>>>>>>>> Gluster Users
>>>>>>>>>>> [<gluster-users at gluster.org>](mailto:gluster-users at gluster.org)
>>>>>>>>>>>
>>>>>>>>>>> Explore the following:
>>>>>>>>>>>
>>>>>>>>>>> - Launch index heal and look at the glustershd logs of all bricks
>>>>>>>>>>> for possible errors
>>>>>>>>>>>
>>>>>>>>>>> - See if the glustershd in each node is connected to all bricks.
>>>>>>>>>>>
>>>>>>>>>>> - If not try to restart shd by `volume start force`
>>>>>>>>>>>
>>>>>>>>>>> - Launch index heal again and try.
>>>>>>>>>>>
>>>>>>>>>>> - Try debugging the shd log by setting client-log-level to DEBUG
>>>>>>>>>>> temporarily.
>>>>>>>>>>>
>>>>>>>>>>> On 08/22/2017 03:19 AM, mabi wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Sure, it doesn"t look like a split brain based on the output:
>>>>>>>>>>>>
>>>>>>>>>>>> Brick node1.domain.tld:/data/myvolume/brick
>>>>>>>>>>>> Status: Connected
>>>>>>>>>>>> Number of entries in split-brain: 0
>>>>>>>>>>>>
>>>>>>>>>>>> Brick node2.domain.tld:/data/myvolume/brick
>>>>>>>>>>>> Status: Connected
>>>>>>>>>>>> Number of entries in split-brain: 0
>>>>>>>>>>>>
>>>>>>>>>>>> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick
>>>>>>>>>>>> Status: Connected
>>>>>>>>>>>> Number of entries in split-brain: 0
>>>>>>>>>>>>
>>>>>>>>>>>>> -------- Original Message --------
>>>>>>>>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>>>>>>>> Local Time: August 21, 2017 11:35 PM
>>>>>>>>>>>>> UTC Time: August 21, 2017 9:35 PM
>>>>>>>>>>>>> From: bturner at redhat.com
>>>>>>>>>>>>> To: mabi [<mabi at protonmail.ch>](mailto:mabi at protonmail.ch)
>>>>>>>>>>>>> Gluster Users
>>>>>>>>>>>>> [<gluster-users at gluster.org>](mailto:gluster-users at gluster.org)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you also provide:
>>>>>>>>>>>>>
>>>>>>>>>>>>> gluster v heal <my vol> info split-brain
>>>>>>>>>>>>>
>>>>>>>>>>>>> If it is split brain just delete the incorrect file from the brick
>>>>>>>>>>>>> and run heal again. I haven"t tried this with arbiter but I
>>>>>>>>>>>>> assume the process is the same.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -b
>>>>>>>>>>>>>
>>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>>> From: "mabi" [<mabi at protonmail.ch>](mailto:mabi at protonmail.ch)
>>>>>>>>>>>>>> To: "Ben Turner"
>>>>>>>>>>>>>> [<bturner at redhat.com>](mailto:bturner at redhat.com)
>>>>>>>>>>>>>> Cc: "Gluster Users"
>>>>>>>>>>>>>> [<gluster-users at gluster.org>](mailto:gluster-users at gluster.org)
>>>>>>>>>>>>>> Sent: Monday, August 21, 2017 4:55:59 PM
>>>>>>>>>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Ben,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So it is really a 0 kBytes file everywhere (all nodes including
>>>>>>>>>>>>>> the arbiter
>>>>>>>>>>>>>> and from the client).
>>>>>>>>>>>>>> Here below you will find the output you requested. Hopefully that
>>>>>>>>>>>>>> will help
>>>>>>>>>>>>>> to find out why this specific file is not healing... Let me know
>>>>>>>>>>>>>> if you need
>>>>>>>>>>>>>> any more information. Btw node3 is my arbiter node.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> NODE1:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> STAT:
>>>>>>>>>>>>>> File:
>>>>>>>>>>>>>> ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’
>>>>>>>>>>>>>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file
>>>>>>>>>>>>>> Device: 24h/36d Inode: 10033884 Links: 2
>>>>>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
>>>>>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>>>>>>> Modify: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>>>>>>>> Change: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>>>>>>>> Birth: -
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> GETFATTR:
>>>>>>>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>>>>>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZhuknAAlJAg==
>>>>>>>>>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
>>>>>>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOyo=
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> NODE2:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> STAT:
>>>>>>>>>>>>>> File:
>>>>>>>>>>>>>> ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’
>>>>>>>>>>>>>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file
>>>>>>>>>>>>>> Device: 26h/38d Inode: 10031330 Links: 2
>>>>>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
>>>>>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>>>>>>> Modify: 2017-08-14 17:11:46.403704181 +0200
>>>>>>>>>>>>>> Change: 2017-08-14 17:11:46.403704181 +0200
>>>>>>>>>>>>>> Birth: -
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> GETFATTR:
>>>>>>>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>>>>>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZhu6wAA8Hpw==
>>>>>>>>>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
>>>>>>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOVE=
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> NODE3:
>>>>>>>>>>>>>> STAT:
>>>>>>>>>>>>>> File:
>>>>>>>>>>>>>> /srv/glusterfs/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>>>>>>>> Size: 0 Blocks: 0 IO Block: 4096 regular empty file
>>>>>>>>>>>>>> Device: ca11h/51729d Inode: 405208959 Links: 2
>>>>>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
>>>>>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>>>>>>> Modify: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>>>>>>> Change: 2017-08-14 17:11:46.604380051 +0200
>>>>>>>>>>>>>> Birth: -
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> GETFATTR:
>>>>>>>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>>>>>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZe6ejAAKPAg==
>>>>>>>>>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
>>>>>>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOc4=
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> CLIENT GLUSTER MOUNT:
>>>>>>>>>>>>>> STAT:
>>>>>>>>>>>>>> File:
>>>>>>>>>>>>>> "/mnt/myvolume/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png"
>>>>>>>>>>>>>> Size: 0 Blocks: 0 IO Block: 131072 regular empty file
>>>>>>>>>>>>>> Device: 1eh/30d Inode: 11897049013408443114 Links: 1
>>>>>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
>>>>>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>>>>>>> Modify: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>>>>>>>> Change: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>>>>>>>> Birth: -
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -------- Original Message --------
>>>>>>>>>>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>>>>>>>>>> Local Time: August 21, 2017 9:34 PM
>>>>>>>>>>>>>>> UTC Time: August 21, 2017 7:34 PM
>>>>>>>>>>>>>>> From: bturner at redhat.com
>>>>>>>>>>>>>>> To: mabi [<mabi at protonmail.ch>](mailto:mabi at protonmail.ch)
>>>>>>>>>>>>>>> Gluster Users
>>>>>>>>>>>>>>> [<gluster-users at gluster.org>](mailto:gluster-users at gluster.org)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>>>>> From: "mabi" [<mabi at protonmail.ch>](mailto:mabi at protonmail.ch)
>>>>>>>>>>>>>>>> To: "Gluster Users"
>>>>>>>>>>>>>>>> [<gluster-users at gluster.org>](mailto:gluster-users at gluster.org)
>>>>>>>>>>>>>>>> Sent: Monday, August 21, 2017 9:28:24 AM
>>>>>>>>>>>>>>>> Subject: [Gluster-users] self-heal not working
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have a replicat 2 with arbiter GlusterFS 3.8.11 cluster and
>>>>>>>>>>>>>>>> there is
>>>>>>>>>>>>>>>> currently one file listed to be healed as you can see below
>>>>>>>>>>>>>>>> but never gets
>>>>>>>>>>>>>>>> healed by the self-heal daemon:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Brick node1.domain.tld:/data/myvolume/brick
>>>>>>>>>>>>>>>> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>>>>>>>>>> Status: Connected
>>>>>>>>>>>>>>>> Number of entries: 1
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Brick node2.domain.tld:/data/myvolume/brick
>>>>>>>>>>>>>>>> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>>>>>>>>>> Status: Connected
>>>>>>>>>>>>>>>> Number of entries: 1
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick
>>>>>>>>>>>>>>>> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>>>>>>>>>> Status: Connected
>>>>>>>>>>>>>>>> Number of entries: 1
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As once recommended on this mailing list I have mounted that
>>>>>>>>>>>>>>>> glusterfs
>>>>>>>>>>>>>>>> volume
>>>>>>>>>>>>>>>> temporarily through fuse/glusterfs and ran a "stat" on that
>>>>>>>>>>>>>>>> file which is
>>>>>>>>>>>>>>>> listed above but nothing happened.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The file itself is available on all 3 nodes/bricks but on the
>>>>>>>>>>>>>>>> last node it
>>>>>>>>>>>>>>>> has a different date. By the way this file is 0 kBytes big. Is
>>>>>>>>>>>>>>>> that maybe
>>>>>>>>>>>>>>>> the reason why the self-heal does not work?
>>>>>>>>>>>>>>> Is the file actually 0 bytes or is it just 0 bytes on the
>>>>>>>>>>>>>>> arbiter(0 bytes
>>>>>>>>>>>>>>> are expected on the arbiter, it just stores metadata)? Can you
>>>>>>>>>>>>>>> send us the
>>>>>>>>>>>>>>> output from stat on all 3 nodes:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> $ stat <file on back end brick>
>>>>>>>>>>>>>>> $ getfattr -d -m - <file on back end brick>
>>>>>>>>>>>>>>> $ stat <file from gluster mount>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Lets see what things look like on the back end, it should tell
>>>>>>>>>>>>>>> us why
>>>>>>>>>>>>>>> healing is failing.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -b
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> And how can I now make this file to heal?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Mabi
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>>
>>>>>>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170828/dc304405/attachment.html>


More information about the Gluster-users mailing list