[Gluster-users] Heal-failed - what does it really tell us?

John Gardeniers jgardeniers at objectmastery.com
Thu Jul 23 23:45:25 UTC 2015


We have a replica 2, where the second node was freshly added about a 
week ago and as fas as I can tell is fully replicated. This is storage 
for a RHEV cluster and the total space currently in use is about 3.5TB.

When I run "gluster v heal gluster-rhev info heal-failed" it currently 
lists 866 files on the original and 1 file on the recently added node. 
What I find most interesting is that the single file listed on the 
second node is a lease file belonging to a VM template.

Some obvious questions come to mind: What is that output supposed to 
mean? Dose it in fact even have a useful meaning at all? How can the 
files be in a heal-failed condition and not also be in a split-brain 
condition?

My interpretation of "heal-failed" is that the listed files are not yet 
fully in sync across nodes (and are therefore by definition in a 
split-brain condition) but that doesn't match the output of the command. 
However, that can't be the same as the gluster interpretation because 
how can a template file which has received no reads or writes possibly 
be in a heal-failed condition a week after the initial volume heal?



More information about the Gluster-users mailing list