[Gluster-users] [External] Re: Self Heal Confusion
Brett Holcomb
biholcomb at l1049h.com
Tue Jan 1 16:58:31 UTC 2019
Healing time set to 120 seconds for now.
Just to make sure I understand I need to take the result of the gluster
volume heal projects info and put it in a file. Then try and find each
guid listed in that file in the .glusterfs directory for each brick
listed in the output as having unhealed files and delete that file - if
it exists. If it doesn't exist don't worry about it.
So these bricks have unhealed entries listed
/srv/gfs01/Projects/.glusterfs - 85 files
/srv/gfs05/Projects/.glusterfs - 58854 files
/srv/gfs06/Projects/.glusterfs- 58854 files
Script time!
On 12/31/18 4:39 AM, Davide Obbi wrote:
> cluster.quorum-type auto
> cluster.quorum-count (null)
> cluster.server-quorum-type off
> cluster.server-quorum-ratio 0
> cluster.quorum-reads no
>
> Where exacty do I remove the gfid entries from - the .glusterfs
> directory? --> yes can't remember exactly where but try to do a find
> in the brick paths with the gfid it should return something
>
> Where do I put the cluster.heal-timeout option - which file? -->
> gluster volume set volumename option value
>
> On Mon, Dec 31, 2018 at 10:34 AM Brett Holcomb <biholcomb at l1049h.com
> <mailto:biholcomb at l1049h.com>> wrote:
>
> That is probably the case as a lot of files were deleted some time
> ago.
>
> I'm on version 5.2 but was on 3.12 until about a week ago.
>
> Here is the quorum info. I'm running a distributed replicated
> volumes
> in 2 x 3 = 6
>
> cluster.quorum-type auto
> cluster.quorum-count (null)
> cluster.server-quorum-type off
> cluster.server-quorum-ratio 0
> cluster.quorum-reads no
>
> Where exacty do I remove the gfid entries from - the .glusterfs
> directory? Do I just delete all the directories can files under this
> directory?
>
> Where do I put the cluster.heal-timeout option - which file?
>
> I think you've hit on the cause of the issue. Thinking back we've
> had
> some extended power outages and due to a misconfiguration in the swap
> file device name a couple of the nodes did not come up and I didn't
> catch it for a while so maybe the deletes occured then.
>
> Thank you.
>
> On 12/31/18 2:58 AM, Davide Obbi wrote:
> > if the long GFID does not correspond to any file it could mean the
> > file has been deleted by the client mounting the volume. I think
> this
> > is caused when the delete was issued and the number of active
> bricks
> > were not reaching quorum majority or a second brick was taken down
> > while another was down or did not finish the selfheal, the
> latter more
> > likely.
> > It would be interesting to see:
> > - what version of glusterfs you running, it happened to me with 3.12
> > - volume quorum rules: "gluster volume get vol all | grep quorum"
> >
> > To clean it up if i remember correctly it should be possible to
> delete
> > the gfid entries from the brick mounts on the glusterfs server
> nodes
> > reporting the files to heal.
> >
> > As a side note you might want to consider changing the selfheal
> > timeout to more agressive schedule in cluster.heal-timeout option
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Davide Obbi
> System Administrator
>
> Booking.com B.V.
> Vijzelstraat 66-80 Amsterdam 1017HL Netherlands
> Direct +31207031558
> Booking.com <https://www.booking.com/>
> Empowering people to experience the world since 1996
> 43 languages, 214+ offices worldwide, 141,000+ global destinations, 29
> million reported listings
> Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190101/1f2e8cc0/attachment.html>
More information about the Gluster-users
mailing list