[Gluster-users] Problem with self-heal

Wed Jul 2 15:40:07 UTC 2014

On 07/02/2014 06:15 PM, Milos Kozak wrote:
> Hi,
>
> I am going to replicate the problem on clean gluster configuration
> latter today. So far my answers are below.
>
> On 7/2/2014 1:38 AM, Ravishankar N wrote:
>> On 07/02/2014 02:28 AM, Miloš Kozák wrote:
>>> Hi,
>>> I am running some test on top of v3.5.1 in my 2 nodes configuration
>>> with one disk each and replica 2 mode.
>>>
>>> I have two servers connected by a cable. Through this cable I let
>>> glusterd communicate. I start dd to create a relatively large file. In
>>> the middle of writing process I disconnect the cable, so on one server
>>> (node1) I can see all data and on the other one (node2) I can see just
>>> a split of the file when writing is finished
>>
>> Does this mean your client (mount point) is also on node 1?
>
> Yes I mounted volume on both servers as follows:
> localhost:vg0    /mnt
>
>>> .. no surprise so far.
>>>
>>> Then I put the cable back. After a while peers are discovered,
>>> self-healing daemons start to communicate, so I can see:
>>>
>>> gluster volume heal vg0 info
>>> Brick node1:/dist1/brick/fs/
>>> /node-middle - Possibly undergoing heal
>>> Number of entries: 1
>>>
>>> Brick node2:/dist1/brick/fs/
>>> /node-middle - Possibly undergoing heal
>>> Number of entries: 1
>>>
>>> But on the network there are no data moving, which I verify by df..
>>>
>> When  you get "Possibly undergoing heal" and no I/O is going on from the
>> client, it means the self-heal daemon is healing the file. Can you check
>> if there are  messages in glustershd.log of node1 about self-heal
>> completion ?
>
> There are no lines in log, that is the reason why I wrote this email
> eventually.
>
>>> Any help? In my opinion after a while I should get my nodes
>>> synchronized, but after 20minuts of waiting still nothing (the file
>>> was 2G big)
>> Does gluster volume status show all processes being online?
>
> All processes are running.
>

Output of strace -f -p <self-heal-daemon pid> from both nodes might also 
help.

Thanks,
Vijay