[Gluster-users] Problem with self-heal

Tue Jul 15 14:23:26 UTC 2014

On 07/15/2014 06:39 PM, Milos Kozak wrote:
> I read your answer, but I dont know how how to create my RPM files, 
> because I dont want to install it right to the system.. Is there any 
> manual?
>

http://www.gluster.org/community/documentation/index.php/CompilingRPMS
Compile the release-3.5 branch.

> On 7/15/2014 8:34 AM, Ravishankar N wrote:
>> On 07/15/2014 05:47 PM, Milos Kozak wrote:
>>> Hi,
>>> Yesterday I was gonna to replicate the error, but I didnt managed to
>>> do it, so I started to wonder whether it wasnt bad call..
>>>
>>> I read the following links, so I would like to ask :D Does it mean,
>>> that this bug is caused by very fast recovery of connection? Or are
>>> there other things that come to the game? I am running 3.5.1 on
>>> production servers for less important stuff, and there one server came
>>> down this weekend. After all the heal process was totally fine. As
>>> long as the real server boots nearly 5minuts. Does it mean that this
>>> was the reason why I didnt experienced this bug?
>>>
>>
>> Yes,  it happened when the client quickly reconnected before the server
>> had a chance to discard the stale inode and fd tables. Hope you got a
>> chance to look at my comment in the BZ [1]
>> Thanks,
>> Ravi
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1115748#c16
>>
>>
>>>
>>> When we can expect Gluster 3.5.2 to be released?
>>>
>>> Thanks Milos
>>>
>>>
>>>
>>>
>>> On 7/13/2014 10:23 PM, Ravishankar N wrote:
>>>> On 07/13/2014 09:05 PM, Miloš Kozák wrote:
>>>>> Hi, I would like to ask about the progress. On the ticket there is
>>>>> nothing new added..
>>>>>
>>>>
>>>>
>>>> I haven't had a chance to look at the logs/ reproduce the bug. Will 
>>>> get
>>>> to it in a couple of days.
>>>> Thanks,
>>>> Ravi
>>>>
>>>>
>>>>> Thanks, Milos
>>>>>
>>>>>
>>>>>
>>>>> Dne 14-07-02 11:37 PM, Miloš Kozák napsal(a):
>>>>>> Submitted: 1115748
>>>>>>
>>>>>> Milos
>>>>>>
>>>>>> Dne 14-07-02 11:40 AM, Vijay Bellur napsal(a):
>>>>>>> On 07/02/2014 06:15 PM, Milos Kozak wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am going to replicate the problem on clean gluster configuration
>>>>>>>> latter today. So far my answers are below.
>>>>>>>>
>>>>>>>> On 7/2/2014 1:38 AM, Ravishankar N wrote:
>>>>>>>>> On 07/02/2014 02:28 AM, Miloš Kozák wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> I am running some test on top of v3.5.1 in my 2 nodes
>>>>>>>>>> configuration
>>>>>>>>>> with one disk each and replica 2 mode.
>>>>>>>>>>
>>>>>>>>>> I have two servers connected by a cable. Through this cable I 
>>>>>>>>>> let
>>>>>>>>>> glusterd communicate. I start dd to create a relatively large
>>>>>>>>>> file. In
>>>>>>>>>> the middle of writing process I disconnect the cable, so on one
>>>>>>>>>> server
>>>>>>>>>> (node1) I can see all data and on the other one (node2) I can 
>>>>>>>>>> see
>>>>>>>>>> just
>>>>>>>>>> a split of the file when writing is finished
>>>>>>>>>
>>>>>>>>> Does this mean your client (mount point) is also on node 1?
>>>>>>>>
>>>>>>>> Yes I mounted volume on both servers as follows:
>>>>>>>> localhost:vg0    /mnt
>>>>>>>>
>>>>>>>>>> .. no surprise so far.
>>>>>>>>>>
>>>>>>>>>> Then I put the cable back. After a while peers are discovered,
>>>>>>>>>> self-healing daemons start to communicate, so I can see:
>>>>>>>>>>
>>>>>>>>>> gluster volume heal vg0 info
>>>>>>>>>> Brick node1:/dist1/brick/fs/
>>>>>>>>>> /node-middle - Possibly undergoing heal
>>>>>>>>>> Number of entries: 1
>>>>>>>>>>
>>>>>>>>>> Brick node2:/dist1/brick/fs/
>>>>>>>>>> /node-middle - Possibly undergoing heal
>>>>>>>>>> Number of entries: 1
>>>>>>>>>>
>>>>>>>>>> But on the network there are no data moving, which I verify by
>>>>>>>>>> df..
>>>>>>>>>>
>>>>>>>>> When  you get "Possibly undergoing heal" and no I/O is going on
>>>>>>>>> from the
>>>>>>>>> client, it means the self-heal daemon is healing the file. Can 
>>>>>>>>> you
>>>>>>>>> check
>>>>>>>>> if there are  messages in glustershd.log of node1 about self-heal
>>>>>>>>> completion ?
>>>>>>>>
>>>>>>>> There are no lines in log, that is the reason why I wrote this 
>>>>>>>> email
>>>>>>>> eventually.
>>>>>>>>
>>>>>>>>>> Any help? In my opinion after a while I should get my nodes
>>>>>>>>>> synchronized, but after 20minuts of waiting still nothing (the
>>>>>>>>>> file
>>>>>>>>>> was 2G big)
>>>>>>>>> Does gluster volume status show all processes being online?
>>>>>>>>
>>>>>>>> All processes are running.
>>>>>>>>
>>>>>>>
>>>>>>> Output of strace -f -p <self-heal-daemon pid> from both nodes might
>>>>>>> also help.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vijay
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users