[Gluster-users] Problem with self-heal

Thu Jul 3 15:54:33 UTC 2014

On 07/03/2014 08:56 PM, Tiziano Müller wrote:
> Hi Pranith
>
> Am 03.07.2014 17:16, schrieb Pranith Kumar Karampuri:
> [...]
>>> Is there some documentation on the meaning of the
>>> trusted.afr.virtualization-client attribute?
>> https://github.com/gluster/glusterfs/blob/master/doc/features/afr-v1.md
> Thanks.
>
>> Is I/O happening on those files? I think yes because they are VM files. There
>> was this problem of false +ves with releases earlier than 3.5.1. Releases
>> earlier than 3.5.1 did not have capability to distinguish between on-going I/O
>> and requirement of self-heal. So even if I/O is happening they will be shown
>> under files that need self-heal.
> Ok, that explains why some of the files are suddenly listed and then vanish again.
>
> The problem is that when we shut down all VMs (which were using gfapi) last
> week, some images were listed as to be self-healed, but no I/O happened.
> Also after a gluster vol stop/start and a reboot, the same files were listed and
> nothing changed. After comparing the checksums of the files on the 2 bricks we
> resumed operation.
It would be helpful if you could provide getfattr output when such 
things happen so that we can try to see why it is happening that way.
These are afr changelog smells I developed over time working on afr, 
they would be correct most of the times but not always:
Once I see getfattr output on both the bricks,
1) If files have equal numbers and the files are undergoing changes, 
most probably it is just normal I/O no heal is required
2) If files have unequal numbers with the numbers differing by a lot and 
files are undergoing changes, then most probably heal is required while 
I/O is going on.
3) If files have unequal numbers with numbers differing and files are 
not undergoing changes, the heal is required.
4) If files have equal numbers with same numbers and files are not 
undergoing changes, then the mount must have crashed or the volume is 
stopped while the I/O is in progress.

Again these are just most probable guesses not accurate.

Pranith
>
> Any ideas?
>
> Best,
> Tiziano
>
>
>> Pranith
>>
>>> Thanks in advance,
>>> Tiziano
>>>
>>>> Pranith
>>>>> Best,
>>>>> Tiziano
>>>>>
>>>>> Am 01.07.2014 22:58, schrieb Miloš Kozák:
>>>>>> Hi,
>>>>>> I am running some test on top of v3.5.1 in my 2 nodes configuration with one
>>>>>> disk each and replica 2 mode.
>>>>>>
>>>>>> I have two servers connected by a cable. Through this cable I let glusterd
>>>>>> communicate. I start dd to create a relatively large file. In the middle of
>>>>>> writing process I disconnect the cable, so on one server (node1) I can see all
>>>>>> data and on the other one (node2) I can see just a split of the file when
>>>>>> writing is finished.. no surprise so far.
>>>>>>
>>>>>> Then I put the cable back. After a while peers are discovered, self-healing
>>>>>> daemons start to communicate, so I can see:
>>>>>>
>>>>>> gluster volume heal vg0 info
>>>>>> Brick node1:/dist1/brick/fs/
>>>>>> /node-middle - Possibly undergoing heal
>>>>>> Number of entries: 1
>>>>>>
>>>>>> Brick node2:/dist1/brick/fs/
>>>>>> /node-middle - Possibly undergoing heal
>>>>>> Number of entries: 1
>>>>>>
>>>>>> But on the network there are no data moving, which I verify by df..
>>>>>>
>>>>>> Any help? In my opinion after a while I should get my nodes synchronized, but
>>>>>> after 20minuts of waiting still nothing (the file was 2G big)
>>>>>>
>>>>>> Thanks Milos
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users