[Gluster-users] Problem with self-heal

Pranith Kumar Karampuri pkarampu at redhat.com
Thu Jul 3 15:16:27 UTC 2014


On 07/03/2014 07:08 PM, Tiziano Müller wrote:
> Hi Pranith
>
> Am 03.07.2014 07:01, schrieb Pranith Kumar Karampuri:
>> On 07/02/2014 06:39 PM, Tiziano Müller wrote:
>>> Hi there
>>>
>>> Not sure whether this is related, but we see the same problem with
>>> glusterfs-3.4(.2). Several files are listed as being healed but they never
>>> finish and checksums are identical.
>>> We had some problems with NTP, meaning that the clocks on the nodes diverged by
>>> a couple of seconds. I suspect this may be the root cause for it, but I could
>>> not do any further tests and the files are still in the same state
>>> (self-healing).
>>>
>>> Interestingly there are other threads describing this sort of problem, but
>>> nothing came out so far.
>> Could you give getfattr -d -m. -e hex <file-that-gives-this-problem-on-backend>
>> outputs on both the bricks of the replica pair to see what the problem is.
> Ok, I picked one the volumes which are permanently listed in the heal info:
>
> node-01 ~ # gluster vol heal virtualization info | grep db98
> /vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
> /vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
>
> node-01 ~ # getfattr -d -m. -e hex
> /var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
>
> getfattr: Removing leading '/' from absolute path names
> # file:
> var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
> trusted.afr.virtualization-client-0=0x000000020000000000000000
> trusted.afr.virtualization-client-1=0x000000020000000000000000
> trusted.gfid=0xa7d0b8a3cf0d41c0b2775b99ea3cbeec
>
> node-02 ~ # getfattr -d -m. -e hex
> /var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
>
> getfattr: Removing leading '/' from absolute path names
> # file:
> var/data/gluster-volume-01/vm-persistent/0f83f084-8080-413e-b558-b678e504836e/db982933-9a36-44f1-8cc2-21fb6d34023f.qcow2
> trusted.afr.virtualization-client-0=0x000000020000000000000000
> trusted.afr.virtualization-client-1=0x000000020000000000000000
> trusted.gfid=0xa7d0b8a3cf0d41c0b2775b99ea3cbeec
>
> Is there some documentation on the meaning of the
> trusted.afr.virtualization-client attribute?
https://github.com/gluster/glusterfs/blob/master/doc/features/afr-v1.md

Is I/O happening on those files? I think yes because they are VM files. 
There was this problem of false +ves with releases earlier than 3.5.1. 
Releases earlier than 3.5.1 did not have capability to distinguish 
between on-going I/O and requirement of self-heal. So even if I/O is 
happening they will be shown under files that need self-heal.

Pranith

>
> Thanks in advance,
> Tiziano
>
>> Pranith
>>> Best,
>>> Tiziano
>>>
>>> Am 01.07.2014 22:58, schrieb Miloš Kozák:
>>>> Hi,
>>>> I am running some test on top of v3.5.1 in my 2 nodes configuration with one
>>>> disk each and replica 2 mode.
>>>>
>>>> I have two servers connected by a cable. Through this cable I let glusterd
>>>> communicate. I start dd to create a relatively large file. In the middle of
>>>> writing process I disconnect the cable, so on one server (node1) I can see all
>>>> data and on the other one (node2) I can see just a split of the file when
>>>> writing is finished.. no surprise so far.
>>>>
>>>> Then I put the cable back. After a while peers are discovered, self-healing
>>>> daemons start to communicate, so I can see:
>>>>
>>>> gluster volume heal vg0 info
>>>> Brick node1:/dist1/brick/fs/
>>>> /node-middle - Possibly undergoing heal
>>>> Number of entries: 1
>>>>
>>>> Brick node2:/dist1/brick/fs/
>>>> /node-middle - Possibly undergoing heal
>>>> Number of entries: 1
>>>>
>>>> But on the network there are no data moving, which I verify by df..
>>>>
>>>> Any help? In my opinion after a while I should get my nodes synchronized, but
>>>> after 20minuts of waiting still nothing (the file was 2G big)
>>>>
>>>> Thanks Milos
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users




More information about the Gluster-users mailing list