[Gluster-users] [External] Re: Self Heal Confusion

Brett Holcomb biholcomb at l1049h.com
Mon Dec 31 09:34:00 UTC 2018


That is probably the case as a lot of files were deleted some time ago.

I'm on version 5.2 but was on 3.12 until about a week ago.

Here is the quorum info.  I'm running a distributed replicated volumes 
in 2 x 3 = 6

cluster.quorum-type auto
cluster.quorum-count (null)
cluster.server-quorum-type off
cluster.server-quorum-ratio 0
cluster.quorum-reads                    no

Where exacty do I remove the gfid entries from - the .glusterfs 
directory?  Do I just delete all the directories can files under this 
directory?

Where do I put the cluster.heal-timeout option - which file?

I think you've hit on the cause of the issue.  Thinking back we've had 
some extended power outages and due to a misconfiguration in the swap 
file device name a couple of the nodes did not come up and I didn't 
catch it for a while so maybe the deletes occured then.

Thank you.

On 12/31/18 2:58 AM, Davide Obbi wrote:
> if the long GFID does not correspond to any file it could mean the 
> file has been deleted by the client mounting the volume. I think this 
> is caused when the delete was issued and the number of active bricks 
> were not reaching quorum majority or a second brick was taken down 
> while another was down or did not finish the selfheal, the latter more 
> likely.
> It would be interesting to see:
> - what version of glusterfs you running, it happened to me with 3.12
> - volume quorum rules: "gluster volume get vol all | grep quorum"
>
> To clean it up if i remember correctly it should be possible to delete 
> the gfid entries from the brick mounts on the glusterfs server nodes 
> reporting the files to heal.
>
> As a side note you might want to consider changing the selfheal 
> timeout to more agressive schedule in cluster.heal-timeout option


More information about the Gluster-users mailing list