[Gluster-users] Heal not working
jdarcy at redhat.com
Mon Nov 26 15:39:37 UTC 2012
On 11/26/2012 05:26 AM, Mario Kadastik wrote:
> I have a volume created of 12 bricks and with 3x replication (no stripe). We had to take one server (2 bricks per server, but configured such that first brick from every server, then second brick from every server so there should not be 1 server multiple times in any replica groups) for maintenance. The server was down for 40 minutes and after it came up I saw that gluster volume heal home0 info showed some files. I started healing, but after 3 days it's still the same. Today I enabled quorum enforcement to make sure we don't get for future split brains and as we have 3 replicas, then 2 should make quorum.
> Anyway, the healing information is attached to this e-mail for commands:
> [root at se1 ~]# for i in "" heal-failed split-brain; do gluster volume heal home0 info $i > home-heal-$i.txt 2>&1; done
For some of the files where healing failed, check the extended attributes on
each replica. For example:
getfattr -d -e hex -m . .../res/out_files_485.tgz
Also, check the logs in /var/log/glusterfs to see if they give any indication
of why self-heal is failing. In my experience, the most common cause of such
failures is GFID mismatches, which are really a form of split brain but not
recognized or handled as such (which is why they don't get reported there).
These can occur e.g. if a file is created separately on two bricks due to a
network partition or two servers being down at different times.
More information about the Gluster-users