[Gluster-users] outage post-mortem

Nicolas Ochem nicolas.ochem at gmail.com
Fri Mar 28 21:58:06 UTC 2014


Joe,
Thanks for your reply.
I grep'd the logs for the name of one of the files that had become
unreachable over NFS after resync (i/o error). It comes up in
<volumename>.log and nfs.log on the node that had stayed online:
The relevant logs are here :
https://gist.github.com/nicolasochem/f9d24a2bf57b0d40bb7d

One important piece of information is that the node that was taken offline
had previously filled up the root filesystem because of a
memory/southbridge issue which filled /var/log/messages completely. Upon
restoration of the machine, gluster did not come up because one file in
/var/lib/gluster/peers was empty.
The issue is described there :
https://bugzilla.redhat.com/show_bug.cgi?id=858732

I removed the empty peer file, glusterd started, and then I started having
i/o errors described in my original mail.

The key log data is IMO : "background meta-data data missing-entry
self-heal failed on"

Based on this and the log, could it be that gluster failed to write to
/var/lib/gluster because of disk full, which caused issues ?


On Fri, Mar 28, 2014 at 8:13 AM, Joe Julian <joe at julianfamily.org> wrote:

>
>
> On March 27, 2014 11:08:03 PM PDT, Nicolas Ochem <nicolas.ochem at gmail.com>
> wrote:
> >Hi list,
> >I would like to describe an issue I had today with Gluster and ask for
> >opinion:
> >
> >I have a replicated mount with 2 replica. There is about 1TB of
> >production
> >data in there in around 100.000 files. They sit on 2x Supermicro
> >x9dr3-ln4f
> >machines with a RAID array of 18TB each, 64gb of ram, 2x Xeon CPUs, as
> >recommended in Red Hat hardware guidelines for storage server. They
> >have a
> >10gb link between each other. I am running gluster 3.4.2 on centos 6.5
> >
> >This storage is NFS-mounted to a lot of production servers. A very
> >little
> >part of this data is actually useful, the rest is legacy.
> >
> >Due to some unrelated issue with one of the supermicro server (faulty
> >memory), I had to take one of the nodes offline for 3 days.
> >
> >When I brought it back up, some files and directories ended up in
> >heal-failed state (but no split-brain). Unfortunately that were the
> >critical files that had been edited in the last 3 days. On the NFS
> >mounts,
> >attempts to read these files resulted in I/O error.
> >
> >I was able to fix a few of these files by manually removing them in
> >each
> >brick and then copying them to the mounted volume again. But I did not
> >know
> >what to do when full directories were unreachable because of "heal
> >failed".
> >
> >I later read that healing could take time and that heal-failed may be a
> >transient state (is that correct?
> >
> http://stackoverflow.com/questions/19257054/is-it-normal-to-get-a-lot-of-heal-failed-entries-in-a-gluster-mount
> ),
> >but at the time I thought that was beyond recovery, so I proceeded to
> >destroy the gluster volume. Then on one of the replicas I moved the
> >content
> >of the brick to another directory, created another volume with the same
> >name, then copied the content of the brick to the mounted volume. This
> >took
> >around 2 hours. Then I had to reboot all my NFS-mounted machines which
> >were
> >in "stale NFS file handle" state.
> >
> >Few questions :
> >- I realize that I cannot expect 1TB of data to heal instantly, but is
> >there any way for me to know if  the system would have recovered
> >eventually
> >despite being shown as "heal failed" ?
> >- if yes, what amount of files and filesize should I clean-up from my
> >volume to make this time go under 10 minutes ?
> >- would native gluster mounts instead of NFS have been of help here ?
> >- would any other course of action have resulted in faster recovery
> >time ?
> >- is there a way in such situation to make one replica have authority
> >about
> >the correct status of the filesystem  ?
> >
> >Thanks in advance for your replies.
> >
> >
> Although the self-heal daemon can take time to heal all the files,
> accessing a file that needs healed does trigger the heal to be performed
> immediately by the client (the nfs server is the client in this case).
>
> Like pretty much all errors in GlusterFS, you would have had to look in
> the logs to find why something as vague as "heal failed" happened.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140328/9617a44f/attachment.html>


More information about the Gluster-users mailing list