[Gluster-users] self-heal not working

Ravishankar N ravishankar at redhat.com
Sun Aug 27 13:45:32 UTC 2017


Yes, the shds did pick up the file for healing (I saw messages like " 
got entry: 1985e233-d5ee-4e3e-a51a-cf0b5f9f2aea") but no error afterwards.

Anyway I reproduced it by manually setting the afr.dirty bit for a zero 
byte file on all 3 bricks. Since there are no afr pending xattrs 
indicating good/bad copies and all files are zero bytes, the data 
self-heal algorithm just picks the file with the latest ctime as source. 
In your case that was the arbiter brick. In the code, there is a check 
to prevent data heals if arbiter is the source. So heal was not 
happening and the entries were not removed from heal-info output.

Perhaps we should add a check in the code to just remove the entries 
from heal-info if size is zero bytes in all bricks.

-Ravi


On 08/25/2017 06:33 PM, mabi wrote:
> Hi Ravi,
>
> Did you get a chance to have a look at the log files I have attached 
> in my last mail?
>
> Best,
> Mabi
>
>
>
>> -------- Original Message --------
>> Subject: Re: [Gluster-users] self-heal not working
>> Local Time: August 24, 2017 12:08 PM
>> UTC Time: August 24, 2017 10:08 AM
>> From: mabi at protonmail.ch
>> To: Ravishankar N <ravishankar at redhat.com>
>> Ben Turner <bturner at redhat.com>, Gluster Users 
>> <gluster-users at gluster.org>
>>
>> Thanks for confirming the command. I have now enabled DEBUG 
>> client-log-level, run a heal and then attached the glustershd log 
>> files of all 3 nodes in this mail.
>>
>> The volume concerned is called myvol-pro, the other 3 volumes have no 
>> problem so far.
>>
>> Also note that in the mean time it looks like the file has been 
>> deleted by the user and as such the heal info command does not show 
>> the file name anymore but just is GFID which is:
>>
>> gfid:1985e233-d5ee-4e3e-a51a-cf0b5f9f2aea
>>
>>
>>
>> Hope that helps for debugging this issue.
>>
>>> -------- Original Message --------
>>> Subject: Re: [Gluster-users] self-heal not working
>>> Local Time: August 24, 2017 5:58 AM
>>> UTC Time: August 24, 2017 3:58 AM
>>> From: ravishankar at redhat.com
>>> To: mabi <mabi at protonmail.ch>
>>> Ben Turner <bturner at redhat.com>, Gluster Users 
>>> <gluster-users at gluster.org>
>>>
>>>
>>> Unlikely. In your case only the afr.dirty is set, not the 
>>> afr.volname-client-xx xattr.
>>>
>>> `gluster volume set myvolume diagnostics.client-log-level DEBUG` is 
>>> right.
>>>
>>>
>>> On 08/23/2017 10:31 PM, mabi wrote:
>>>> I just saw the following bug which was fixed in 3.8.15:
>>>>
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1471613
>>>>
>>>> Is it possible that the problem I described in this post is related 
>>>> to that bug?
>>>>
>>>>
>>>>
>>>>> -------- Original Message --------
>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>> Local Time: August 22, 2017 11:51 AM
>>>>> UTC Time: August 22, 2017 9:51 AM
>>>>> From: ravishankar at redhat.com
>>>>> To: mabi <mabi at protonmail.ch>
>>>>> Ben Turner <bturner at redhat.com>, Gluster Users 
>>>>> <gluster-users at gluster.org>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 08/22/2017 02:30 PM, mabi wrote:
>>>>>> Thanks for the additional hints, I have the following 2 questions 
>>>>>> first:
>>>>>>
>>>>>> - In order to launch the index heal is the following command correct:
>>>>>> gluster volume heal myvolume
>>>>>>
>>>>> Yes
>>>>>
>>>>>> - If I run a "volume start force" will it have any short 
>>>>>> disruptions on my clients which mount the volume through FUSE? If 
>>>>>> yes, how long? This is a production system that's why I am asking.
>>>>>>
>>>>>>
>>>>> No. You can actually create a test volume on  your personal linux 
>>>>> box to try these kinds of things without needing multiple 
>>>>> machines. This is how we develop and test our patches :)
>>>>> 'gluster volume create testvol replica 3 
>>>>> /home/mabi/bricks/brick{1..3} force` and so on.
>>>>>
>>>>> HTH,
>>>>> Ravi
>>>>>
>>>>>
>>>>>>
>>>>>>> -------- Original Message --------
>>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>> Local Time: August 22, 2017 6:26 AM
>>>>>>> UTC Time: August 22, 2017 4:26 AM
>>>>>>> From: ravishankar at redhat.com
>>>>>>> To: mabi <mabi at protonmail.ch>, Ben Turner <bturner at redhat.com>
>>>>>>> Gluster Users <gluster-users at gluster.org>
>>>>>>>
>>>>>>>
>>>>>>> Explore the following:
>>>>>>>
>>>>>>> - Launch index heal and look at the glustershd logs of all 
>>>>>>> bricks for possible errors
>>>>>>>
>>>>>>> - See if the glustershd in each node is connected to all bricks.
>>>>>>>
>>>>>>> - If not try to restart shd by `volume start force`
>>>>>>>
>>>>>>> - Launch index heal again and try.
>>>>>>>
>>>>>>> - Try debugging the shd log by setting client-log-level to DEBUG 
>>>>>>> temporarily.
>>>>>>>
>>>>>>>
>>>>>>> On 08/22/2017 03:19 AM, mabi wrote:
>>>>>>>> Sure, it doesn't look like a split brain based on the output:
>>>>>>>>
>>>>>>>> Brick node1.domain.tld:/data/myvolume/brick
>>>>>>>> Status: Connected
>>>>>>>> Number of entries in split-brain: 0
>>>>>>>>
>>>>>>>> Brick node2.domain.tld:/data/myvolume/brick
>>>>>>>> Status: Connected
>>>>>>>> Number of entries in split-brain: 0
>>>>>>>>
>>>>>>>> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick
>>>>>>>> Status: Connected
>>>>>>>> Number of entries in split-brain: 0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> -------- Original Message --------
>>>>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>>>> Local Time: August 21, 2017 11:35 PM
>>>>>>>>> UTC Time: August 21, 2017 9:35 PM
>>>>>>>>> From: bturner at redhat.com
>>>>>>>>> To: mabi <mabi at protonmail.ch>
>>>>>>>>> Gluster Users <gluster-users at gluster.org>
>>>>>>>>>
>>>>>>>>> Can you also provide:
>>>>>>>>>
>>>>>>>>> gluster v heal <my vol> info split-brain
>>>>>>>>>
>>>>>>>>> If it is split brain just delete the incorrect file from the 
>>>>>>>>> brick and run heal again. I haven"t tried this with arbiter 
>>>>>>>>> but I assume the process is the same.
>>>>>>>>>
>>>>>>>>> -b
>>>>>>>>>
>>>>>>>>> ----- Original Message -----
>>>>>>>>> > From: "mabi" <mabi at protonmail.ch>
>>>>>>>>> > To: "Ben Turner" <bturner at redhat.com>
>>>>>>>>> > Cc: "Gluster Users" <gluster-users at gluster.org>
>>>>>>>>> > Sent: Monday, August 21, 2017 4:55:59 PM
>>>>>>>>> > Subject: Re: [Gluster-users] self-heal not working
>>>>>>>>> >
>>>>>>>>> > Hi Ben,
>>>>>>>>> >
>>>>>>>>> > So it is really a 0 kBytes file everywhere (all nodes 
>>>>>>>>> including the arbiter
>>>>>>>>> > and from the client).
>>>>>>>>> > Here below you will find the output you requested. Hopefully 
>>>>>>>>> that will help
>>>>>>>>> > to find out why this specific file is not healing... Let me 
>>>>>>>>> know if you need
>>>>>>>>> > any more information. Btw node3 is my arbiter node.
>>>>>>>>> >
>>>>>>>>> > NODE1:
>>>>>>>>> >
>>>>>>>>> > STAT:
>>>>>>>>> > File:
>>>>>>>>> > 
>>>>>>>>> ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’
>>>>>>>>> > Size: 0 Blocks: 38 IO Block: 131072 regular empty file
>>>>>>>>> > Device: 24h/36d Inode: 10033884 Links: 2
>>>>>>>>> > Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 
>>>>>>>>> 33/www-data)
>>>>>>>>> > Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>> > Modify: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>>> > Change: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>>> > Birth: -
>>>>>>>>> >
>>>>>>>>> > GETFATTR:
>>>>>>>>> > trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>>>>> > trusted.bit-rot.version=0sAgAAAAAAAABZhuknAAlJAg==
>>>>>>>>> > trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
>>>>>>>>> > 
>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOyo=
>>>>>>>>> >
>>>>>>>>> > NODE2:
>>>>>>>>> >
>>>>>>>>> > STAT:
>>>>>>>>> > File:
>>>>>>>>> > 
>>>>>>>>> ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’
>>>>>>>>> > Size: 0 Blocks: 38 IO Block: 131072 regular empty file
>>>>>>>>> > Device: 26h/38d Inode: 10031330 Links: 2
>>>>>>>>> > Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 
>>>>>>>>> 33/www-data)
>>>>>>>>> > Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>> > Modify: 2017-08-14 17:11:46.403704181 +0200
>>>>>>>>> > Change: 2017-08-14 17:11:46.403704181 +0200
>>>>>>>>> > Birth: -
>>>>>>>>> >
>>>>>>>>> > GETFATTR:
>>>>>>>>> > trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>>>>> > trusted.bit-rot.version=0sAgAAAAAAAABZhu6wAA8Hpw==
>>>>>>>>> > trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
>>>>>>>>> > 
>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOVE=
>>>>>>>>> >
>>>>>>>>> > NODE3:
>>>>>>>>> > STAT:
>>>>>>>>> > File:
>>>>>>>>> > 
>>>>>>>>> /srv/glusterfs/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>>> > Size: 0 Blocks: 0 IO Block: 4096 regular empty file
>>>>>>>>> > Device: ca11h/51729d Inode: 405208959 Links: 2
>>>>>>>>> > Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 
>>>>>>>>> 33/www-data)
>>>>>>>>> > Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>> > Modify: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>> > Change: 2017-08-14 17:11:46.604380051 +0200
>>>>>>>>> > Birth: -
>>>>>>>>> >
>>>>>>>>> > GETFATTR:
>>>>>>>>> > trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>>>>> > trusted.bit-rot.version=0sAgAAAAAAAABZe6ejAAKPAg==
>>>>>>>>> > trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
>>>>>>>>> > 
>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOc4=
>>>>>>>>> >
>>>>>>>>> > CLIENT GLUSTER MOUNT:
>>>>>>>>> > STAT:
>>>>>>>>> > File:
>>>>>>>>> > 
>>>>>>>>> "/mnt/myvolume/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png"
>>>>>>>>> > Size: 0 Blocks: 0 IO Block: 131072 regular empty file
>>>>>>>>> > Device: 1eh/30d Inode: 11897049013408443114 Links: 1
>>>>>>>>> > Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 
>>>>>>>>> 33/www-data)
>>>>>>>>> > Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>> > Modify: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>>> > Change: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>>> > Birth: -
>>>>>>>>> >
>>>>>>>>> > > -------- Original Message --------
>>>>>>>>> > > Subject: Re: [Gluster-users] self-heal not working
>>>>>>>>> > > Local Time: August 21, 2017 9:34 PM
>>>>>>>>> > > UTC Time: August 21, 2017 7:34 PM
>>>>>>>>> > > From: bturner at redhat.com
>>>>>>>>> > > To: mabi <mabi at protonmail.ch>
>>>>>>>>> > > Gluster Users <gluster-users at gluster.org>
>>>>>>>>> > >
>>>>>>>>> > > ----- Original Message -----
>>>>>>>>> > >> From: "mabi" <mabi at protonmail.ch>
>>>>>>>>> > >> To: "Gluster Users" <gluster-users at gluster.org>
>>>>>>>>> > >> Sent: Monday, August 21, 2017 9:28:24 AM
>>>>>>>>> > >> Subject: [Gluster-users] self-heal not working
>>>>>>>>> > >>
>>>>>>>>> > >> Hi,
>>>>>>>>> > >>
>>>>>>>>> > >> I have a replicat 2 with arbiter GlusterFS 3.8.11 cluster 
>>>>>>>>> and there is
>>>>>>>>> > >> currently one file listed to be healed as you can see 
>>>>>>>>> below but never gets
>>>>>>>>> > >> healed by the self-heal daemon:
>>>>>>>>> > >>
>>>>>>>>> > >> Brick node1.domain.tld:/data/myvolume/brick
>>>>>>>>> > >> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>>> > >> Status: Connected
>>>>>>>>> > >> Number of entries: 1
>>>>>>>>> > >>
>>>>>>>>> > >> Brick node2.domain.tld:/data/myvolume/brick
>>>>>>>>> > >> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>>> > >> Status: Connected
>>>>>>>>> > >> Number of entries: 1
>>>>>>>>> > >>
>>>>>>>>> > >> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick
>>>>>>>>> > >> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>>> > >> Status: Connected
>>>>>>>>> > >> Number of entries: 1
>>>>>>>>> > >>
>>>>>>>>> > >> As once recommended on this mailing list I have mounted 
>>>>>>>>> that glusterfs
>>>>>>>>> > >> volume
>>>>>>>>> > >> temporarily through fuse/glusterfs and ran a "stat" on 
>>>>>>>>> that file which is
>>>>>>>>> > >> listed above but nothing happened.
>>>>>>>>> > >>
>>>>>>>>> > >> The file itself is available on all 3 nodes/bricks but on 
>>>>>>>>> the last node it
>>>>>>>>> > >> has a different date. By the way this file is 0 kBytes 
>>>>>>>>> big. Is that maybe
>>>>>>>>> > >> the reason why the self-heal does not work?
>>>>>>>>> > >
>>>>>>>>> > > Is the file actually 0 bytes or is it just 0 bytes on the 
>>>>>>>>> arbiter(0 bytes
>>>>>>>>> > > are expected on the arbiter, it just stores metadata)? Can 
>>>>>>>>> you send us the
>>>>>>>>> > > output from stat on all 3 nodes:
>>>>>>>>> > >
>>>>>>>>> > > $ stat <file on back end brick>
>>>>>>>>> > > $ getfattr -d -m - <file on back end brick>
>>>>>>>>> > > $ stat <file from gluster mount>
>>>>>>>>> > >
>>>>>>>>> > > Lets see what things look like on the back end, it should 
>>>>>>>>> tell us why
>>>>>>>>> > > healing is failing.
>>>>>>>>> > >
>>>>>>>>> > > -b
>>>>>>>>> > >
>>>>>>>>> > >>
>>>>>>>>> > >> And how can I now make this file to heal?
>>>>>>>>> > >>
>>>>>>>>> > >> Thanks,
>>>>>>>>> > >> Mabi
>>>>>>>>> > >>
>>>>>>>>> > >>
>>>>>>>>> > >>
>>>>>>>>> > >>
>>>>>>>>> > >> _______________________________________________
>>>>>>>>> > >> Gluster-users mailing list
>>>>>>>>> > >> Gluster-users at gluster.org
>>>>>>>>> > >> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170827/8d12cf09/attachment.html>


More information about the Gluster-users mailing list