[Gluster-users] outage post-mortem

Fri Mar 28 06:08:03 UTC 2014

Hi list,
I would like to describe an issue I had today with Gluster and ask for
opinion:

I have a replicated mount with 2 replica. There is about 1TB of production
data in there in around 100.000 files. They sit on 2x Supermicro x9dr3-ln4f
machines with a RAID array of 18TB each, 64gb of ram, 2x Xeon CPUs, as
recommended in Red Hat hardware guidelines for storage server. They have a
10gb link between each other. I am running gluster 3.4.2 on centos 6.5

This storage is NFS-mounted to a lot of production servers. A very little
part of this data is actually useful, the rest is legacy.

Due to some unrelated issue with one of the supermicro server (faulty
memory), I had to take one of the nodes offline for 3 days.

When I brought it back up, some files and directories ended up in
heal-failed state (but no split-brain). Unfortunately that were the
critical files that had been edited in the last 3 days. On the NFS mounts,
attempts to read these files resulted in I/O error.

I was able to fix a few of these files by manually removing them in each
brick and then copying them to the mounted volume again. But I did not know
what to do when full directories were unreachable because of "heal failed".

I later read that healing could take time and that heal-failed may be a
transient state (is that correct?
http://stackoverflow.com/questions/19257054/is-it-normal-to-get-a-lot-of-heal-failed-entries-in-a-gluster-mount),
but at the time I thought that was beyond recovery, so I proceeded to
destroy the gluster volume. Then on one of the replicas I moved the content
of the brick to another directory, created another volume with the same
name, then copied the content of the brick to the mounted volume. This took
around 2 hours. Then I had to reboot all my NFS-mounted machines which were
in "stale NFS file handle" state.

Few questions :
- I realize that I cannot expect 1TB of data to heal instantly, but is
there any way for me to know if  the system would have recovered eventually
despite being shown as "heal failed" ?
- if yes, what amount of files and filesize should I clean-up from my
volume to make this time go under 10 minutes ?
- would native gluster mounts instead of NFS have been of help here ?
- would any other course of action have resulted in faster recovery time ?
- is there a way in such situation to make one replica have authority about
the correct status of the filesystem  ?

Thanks in advance for your replies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140327/a879b239/attachment.html>