[Gluster-users] Problem with self-heal

Sun Jul 20 18:33:08 UTC 2014

Hi, I tried my best, but I could not replicate the error not even on 3.5.1

Sorry, I can test it. It is kinda veird :D

Dne 14-07-15 10:23 AM, Ravishankar N napsal(a):
> On 07/15/2014 06:39 PM, Milos Kozak wrote:
>> I read your answer, but I dont know how how to create my RPM files, 
>> because I dont want to install it right to the system.. Is there any 
>> manual?
>>
>
> http://www.gluster.org/community/documentation/index.php/CompilingRPMS
> Compile the release-3.5 branch.
>
>> On 7/15/2014 8:34 AM, Ravishankar N wrote:
>>> On 07/15/2014 05:47 PM, Milos Kozak wrote:
>>>> Hi,
>>>> Yesterday I was gonna to replicate the error, but I didnt managed to
>>>> do it, so I started to wonder whether it wasnt bad call..
>>>>
>>>> I read the following links, so I would like to ask :D Does it mean,
>>>> that this bug is caused by very fast recovery of connection? Or are
>>>> there other things that come to the game? I am running 3.5.1 on
>>>> production servers for less important stuff, and there one server came
>>>> down this weekend. After all the heal process was totally fine. As
>>>> long as the real server boots nearly 5minuts. Does it mean that this
>>>> was the reason why I didnt experienced this bug?
>>>>
>>>
>>> Yes,  it happened when the client quickly reconnected before the server
>>> had a chance to discard the stale inode and fd tables. Hope you got a
>>> chance to look at my comment in the BZ [1]
>>> Thanks,
>>> Ravi
>>>
>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1115748#c16
>>>
>>>
>>>>
>>>> When we can expect Gluster 3.5.2 to be released?
>>>>
>>>> Thanks Milos
>>>>
>>>>
>>>>
>>>>
>>>> On 7/13/2014 10:23 PM, Ravishankar N wrote:
>>>>> On 07/13/2014 09:05 PM, Miloš Kozák wrote:
>>>>>> Hi, I would like to ask about the progress. On the ticket there is
>>>>>> nothing new added..
>>>>>>
>>>>>
>>>>>
>>>>> I haven't had a chance to look at the logs/ reproduce the bug. 
>>>>> Will get
>>>>> to it in a couple of days.
>>>>> Thanks,
>>>>> Ravi
>>>>>
>>>>>
>>>>>> Thanks, Milos
>>>>>>
>>>>>>
>>>>>>
>>>>>> Dne 14-07-02 11:37 PM, Miloš Kozák napsal(a):
>>>>>>> Submitted: 1115748
>>>>>>>
>>>>>>> Milos
>>>>>>>
>>>>>>> Dne 14-07-02 11:40 AM, Vijay Bellur napsal(a):
>>>>>>>> On 07/02/2014 06:15 PM, Milos Kozak wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I am going to replicate the problem on clean gluster 
>>>>>>>>> configuration
>>>>>>>>> latter today. So far my answers are below.
>>>>>>>>>
>>>>>>>>> On 7/2/2014 1:38 AM, Ravishankar N wrote:
>>>>>>>>>> On 07/02/2014 02:28 AM, Miloš Kozák wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>> I am running some test on top of v3.5.1 in my 2 nodes
>>>>>>>>>>> configuration
>>>>>>>>>>> with one disk each and replica 2 mode.
>>>>>>>>>>>
>>>>>>>>>>> I have two servers connected by a cable. Through this cable 
>>>>>>>>>>> I let
>>>>>>>>>>> glusterd communicate. I start dd to create a relatively large
>>>>>>>>>>> file. In
>>>>>>>>>>> the middle of writing process I disconnect the cable, so on one
>>>>>>>>>>> server
>>>>>>>>>>> (node1) I can see all data and on the other one (node2) I 
>>>>>>>>>>> can see
>>>>>>>>>>> just
>>>>>>>>>>> a split of the file when writing is finished
>>>>>>>>>>
>>>>>>>>>> Does this mean your client (mount point) is also on node 1?
>>>>>>>>>
>>>>>>>>> Yes I mounted volume on both servers as follows:
>>>>>>>>> localhost:vg0    /mnt
>>>>>>>>>
>>>>>>>>>>> .. no surprise so far.
>>>>>>>>>>>
>>>>>>>>>>> Then I put the cable back. After a while peers are discovered,
>>>>>>>>>>> self-healing daemons start to communicate, so I can see:
>>>>>>>>>>>
>>>>>>>>>>> gluster volume heal vg0 info
>>>>>>>>>>> Brick node1:/dist1/brick/fs/
>>>>>>>>>>> /node-middle - Possibly undergoing heal
>>>>>>>>>>> Number of entries: 1
>>>>>>>>>>>
>>>>>>>>>>> Brick node2:/dist1/brick/fs/
>>>>>>>>>>> /node-middle - Possibly undergoing heal
>>>>>>>>>>> Number of entries: 1
>>>>>>>>>>>
>>>>>>>>>>> But on the network there are no data moving, which I verify by
>>>>>>>>>>> df..
>>>>>>>>>>>
>>>>>>>>>> When  you get "Possibly undergoing heal" and no I/O is going on
>>>>>>>>>> from the
>>>>>>>>>> client, it means the self-heal daemon is healing the file. 
>>>>>>>>>> Can you
>>>>>>>>>> check
>>>>>>>>>> if there are  messages in glustershd.log of node1 about 
>>>>>>>>>> self-heal
>>>>>>>>>> completion ?
>>>>>>>>>
>>>>>>>>> There are no lines in log, that is the reason why I wrote this 
>>>>>>>>> email
>>>>>>>>> eventually.
>>>>>>>>>
>>>>>>>>>>> Any help? In my opinion after a while I should get my nodes
>>>>>>>>>>> synchronized, but after 20minuts of waiting still nothing (the
>>>>>>>>>>> file
>>>>>>>>>>> was 2G big)
>>>>>>>>>> Does gluster volume status show all processes being online?
>>>>>>>>>
>>>>>>>>> All processes are running.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Output of strace -f -p <self-heal-daemon pid> from both nodes 
>>>>>>>> might
>>>>>>>> also help.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vijay
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>