[Gluster-users] self-heal not working

mabi mabi at protonmail.ch
Wed Aug 23 17:01:59 UTC 2017


I just saw the following bug which was fixed in 3.8.15:

https://bugzilla.redhat.com/show_bug.cgi?id=1471613

Is it possible that the problem I described in this post is related to that bug?

> -------- Original Message --------
> Subject: Re: [Gluster-users] self-heal not working
> Local Time: August 22, 2017 11:51 AM
> UTC Time: August 22, 2017 9:51 AM
> From: ravishankar at redhat.com
> To: mabi <mabi at protonmail.ch>
> Ben Turner <bturner at redhat.com>, Gluster Users <gluster-users at gluster.org>
>
> On 08/22/2017 02:30 PM, mabi wrote:
>
>> Thanks for the additional hints, I have the following 2 questions first:
>>
>> - In order to launch the index heal is the following command correct:
>> gluster volume heal myvolume
>
> Yes
>
>> - If I run a "volume start force" will it have any short disruptions on my clients which mount the volume through FUSE? If yes, how long? This is a production system that's why I am asking.
>
> No. You can actually create a test volume on  your personal linux box to try these kinds of things without needing multiple machines. This is how we develop and test our patches :)
> 'gluster volume create testvol replica 3 /home/mabi/bricks/brick{1..3} force` and so on.
>
> HTH,
> Ravi
>
>>> -------- Original Message --------
>>> Subject: Re: [Gluster-users] self-heal not working
>>> Local Time: August 22, 2017 6:26 AM
>>> UTC Time: August 22, 2017 4:26 AM
>>> From: ravishankar at redhat.com
>>> To: mabi [<mabi at protonmail.ch>](mailto:mabi at protonmail.ch), Ben Turner [<bturner at redhat.com>](mailto:bturner at redhat.com)
>>> Gluster Users [<gluster-users at gluster.org>](mailto:gluster-users at gluster.org)
>>>
>>> Explore the following:
>>>
>>> - Launch index heal and look at the glustershd logs of all bricks for possible errors
>>>
>>> - See if the glustershd in each node is connected to all bricks.
>>>
>>> - If not try to restart shd by `volume start force`
>>>
>>> - Launch index heal again and try.
>>>
>>> - Try debugging the shd log by setting client-log-level to DEBUG temporarily.
>>>
>>> On 08/22/2017 03:19 AM, mabi wrote:
>>>
>>>> Sure, it doesn't look like a split brain based on the output:
>>>>
>>>> Brick node1.domain.tld:/data/myvolume/brick
>>>> Status: Connected
>>>> Number of entries in split-brain: 0
>>>>
>>>> Brick node2.domain.tld:/data/myvolume/brick
>>>> Status: Connected
>>>> Number of entries in split-brain: 0
>>>>
>>>> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick
>>>> Status: Connected
>>>> Number of entries in split-brain: 0
>>>>
>>>>> -------- Original Message --------
>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>> Local Time: August 21, 2017 11:35 PM
>>>>> UTC Time: August 21, 2017 9:35 PM
>>>>> From: bturner at redhat.com
>>>>> To: mabi [<mabi at protonmail.ch>](mailto:mabi at protonmail.ch)
>>>>> Gluster Users [<gluster-users at gluster.org>](mailto:gluster-users at gluster.org)
>>>>>
>>>>> Can you also provide:
>>>>>
>>>>> gluster v heal <my vol> info split-brain
>>>>>
>>>>> If it is split brain just delete the incorrect file from the brick and run heal again. I haven"t tried this with arbiter but I assume the process is the same.
>>>>>
>>>>> -b
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "mabi" [<mabi at protonmail.ch>](mailto:mabi at protonmail.ch)
>>>>>> To: "Ben Turner" [<bturner at redhat.com>](mailto:bturner at redhat.com)
>>>>>> Cc: "Gluster Users" [<gluster-users at gluster.org>](mailto:gluster-users at gluster.org)
>>>>>> Sent: Monday, August 21, 2017 4:55:59 PM
>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>
>>>>>> Hi Ben,
>>>>>>
>>>>>> So it is really a 0 kBytes file everywhere (all nodes including the arbiter
>>>>>> and from the client).
>>>>>> Here below you will find the output you requested. Hopefully that will help
>>>>>> to find out why this specific file is not healing... Let me know if you need
>>>>>> any more information. Btw node3 is my arbiter node.
>>>>>>
>>>>>> NODE1:
>>>>>>
>>>>>> STAT:
>>>>>> File:
>>>>>> ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’
>>>>>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file
>>>>>> Device: 24h/36d Inode: 10033884 Links: 2
>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>> Modify: 2017-08-14 17:11:46.407404779 +0200
>>>>>> Change: 2017-08-14 17:11:46.407404779 +0200
>>>>>> Birth: -
>>>>>>
>>>>>> GETFATTR:
>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZhuknAAlJAg==
>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOyo=
>>>>>>
>>>>>> NODE2:
>>>>>>
>>>>>> STAT:
>>>>>> File:
>>>>>> ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’
>>>>>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file
>>>>>> Device: 26h/38d Inode: 10031330 Links: 2
>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>> Modify: 2017-08-14 17:11:46.403704181 +0200
>>>>>> Change: 2017-08-14 17:11:46.403704181 +0200
>>>>>> Birth: -
>>>>>>
>>>>>> GETFATTR:
>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZhu6wAA8Hpw==
>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOVE=
>>>>>>
>>>>>> NODE3:
>>>>>> STAT:
>>>>>> File:
>>>>>> /srv/glusterfs/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>> Size: 0 Blocks: 0 IO Block: 4096 regular empty file
>>>>>> Device: ca11h/51729d Inode: 405208959 Links: 2
>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>> Modify: 2017-08-14 17:04:55.530681000 +0200
>>>>>> Change: 2017-08-14 17:11:46.604380051 +0200
>>>>>> Birth: -
>>>>>>
>>>>>> GETFATTR:
>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZe6ejAAKPAg==
>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOc4=
>>>>>>
>>>>>> CLIENT GLUSTER MOUNT:
>>>>>> STAT:
>>>>>> File:
>>>>>> "/mnt/myvolume/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png"
>>>>>> Size: 0 Blocks: 0 IO Block: 131072 regular empty file
>>>>>> Device: 1eh/30d Inode: 11897049013408443114 Links: 1
>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>> Modify: 2017-08-14 17:11:46.407404779 +0200
>>>>>> Change: 2017-08-14 17:11:46.407404779 +0200
>>>>>> Birth: -
>>>>>>
>>>>>> > -------- Original Message --------
>>>>>> > Subject: Re: [Gluster-users] self-heal not working
>>>>>> > Local Time: August 21, 2017 9:34 PM
>>>>>> > UTC Time: August 21, 2017 7:34 PM
>>>>>> > From: bturner at redhat.com
>>>>>> > To: mabi [<mabi at protonmail.ch>](mailto:mabi at protonmail.ch)
>>>>>> > Gluster Users [<gluster-users at gluster.org>](mailto:gluster-users at gluster.org)
>>>>>> >
>>>>>> > ----- Original Message -----
>>>>>> >> From: "mabi" [<mabi at protonmail.ch>](mailto:mabi at protonmail.ch)
>>>>>> >> To: "Gluster Users" [<gluster-users at gluster.org>](mailto:gluster-users at gluster.org)
>>>>>> >> Sent: Monday, August 21, 2017 9:28:24 AM
>>>>>> >> Subject: [Gluster-users] self-heal not working
>>>>>> >>
>>>>>> >> Hi,
>>>>>> >>
>>>>>> >> I have a replicat 2 with arbiter GlusterFS 3.8.11 cluster and there is
>>>>>> >> currently one file listed to be healed as you can see below but never gets
>>>>>> >> healed by the self-heal daemon:
>>>>>> >>
>>>>>> >> Brick node1.domain.tld:/data/myvolume/brick
>>>>>> >> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>> >> Status: Connected
>>>>>> >> Number of entries: 1
>>>>>> >>
>>>>>> >> Brick node2.domain.tld:/data/myvolume/brick
>>>>>> >> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>> >> Status: Connected
>>>>>> >> Number of entries: 1
>>>>>> >>
>>>>>> >> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick
>>>>>> >> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>> >> Status: Connected
>>>>>> >> Number of entries: 1
>>>>>> >>
>>>>>> >> As once recommended on this mailing list I have mounted that glusterfs
>>>>>> >> volume
>>>>>> >> temporarily through fuse/glusterfs and ran a "stat" on that file which is
>>>>>> >> listed above but nothing happened.
>>>>>> >>
>>>>>> >> The file itself is available on all 3 nodes/bricks but on the last node it
>>>>>> >> has a different date. By the way this file is 0 kBytes big. Is that maybe
>>>>>> >> the reason why the self-heal does not work?
>>>>>> >
>>>>>> > Is the file actually 0 bytes or is it just 0 bytes on the arbiter(0 bytes
>>>>>> > are expected on the arbiter, it just stores metadata)? Can you send us the
>>>>>> > output from stat on all 3 nodes:
>>>>>> >
>>>>>> > $ stat <file on back end brick>
>>>>>> > $ getfattr -d -m - <file on back end brick>
>>>>>> > $ stat <file from gluster mount>
>>>>>> >
>>>>>> > Lets see what things look like on the back end, it should tell us why
>>>>>> > healing is failing.
>>>>>> >
>>>>>> > -b
>>>>>> >
>>>>>> >>
>>>>>> >> And how can I now make this file to heal?
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >> Mabi
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> _______________________________________________
>>>>>> >> Gluster-users mailing list
>>>>>> >> Gluster-users at gluster.org
>>>>>> >> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>>
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170823/c3adac4c/attachment.html>


More information about the Gluster-users mailing list