[Gluster-users] Heal not working

Mon Nov 26 15:39:37 UTC 2012

On 11/26/2012 05:26 AM, Mario Kadastik wrote:
> Hi,
>
> I have a volume created of 12 bricks and with 3x replication (no stripe). We had to take one server (2 bricks per server, but configured such that first brick from every server, then second brick from every server so there should not be 1 server multiple times in any replica groups) for maintenance. The server was down for 40 minutes and after it came up I saw that gluster volume heal home0 info showed some files. I started healing, but after 3 days it's still the same. Today I enabled quorum enforcement to make sure we don't get for future split brains and as we have 3 replicas, then 2 should make quorum.
>
> Anyway, the healing information is attached to this e-mail for commands:
> [root at se1 ~]# for i in "" heal-failed split-brain; do gluster volume heal home0 info $i > home-heal-$i.txt 2>&1; done

For some of the files where healing failed, check the extended attributes on 
each replica.  For example:

	getfattr -d -e hex -m . .../res/out_files_485.tgz

Also, check the logs in /var/log/glusterfs to see if they give any indication 
of why self-heal is failing.  In my experience, the most common cause of such 
failures is GFID mismatches, which are really a form of split brain but not 
recognized or handled as such (which is why they don't get reported there). 
These can occur e.g. if a file is created separately on two bricks due to a 
network partition or two servers being down at different times.