[Gluster-users] Problem with self-heal

Milos Kozak milos.kozak at lejmr.com
Tue Jul 15 12:17:47 UTC 2014


Hi,
Yesterday I was gonna to replicate the error, but I didnt managed to do 
it, so I started to wonder whether it wasnt bad call..

I read the following links, so I would like to ask :D Does it mean, that 
this bug is caused by very fast recovery of connection? Or are there 
other things that come to the game? I am running 3.5.1 on production 
servers for less important stuff, and there one server came down this 
weekend. After all the heal process was totally fine. As long as the 
real server boots nearly 5minuts. Does it mean that this was the reason 
why I didnt experienced this bug?


When we can expect Gluster 3.5.2 to be released?

Thanks Milos




On 7/13/2014 10:23 PM, Ravishankar N wrote:
> On 07/13/2014 09:05 PM, Miloš Kozák wrote:
>> Hi, I would like to ask about the progress. On the ticket there is
>> nothing new added..
>>
>
>
> I haven't had a chance to look at the logs/ reproduce the bug. Will get
> to it in a couple of days.
> Thanks,
> Ravi
>
>
>> Thanks, Milos
>>
>>
>>
>> Dne 14-07-02 11:37 PM, Miloš Kozák napsal(a):
>>> Submitted: 1115748
>>>
>>> Milos
>>>
>>> Dne 14-07-02 11:40 AM, Vijay Bellur napsal(a):
>>>> On 07/02/2014 06:15 PM, Milos Kozak wrote:
>>>>> Hi,
>>>>>
>>>>> I am going to replicate the problem on clean gluster configuration
>>>>> latter today. So far my answers are below.
>>>>>
>>>>> On 7/2/2014 1:38 AM, Ravishankar N wrote:
>>>>>> On 07/02/2014 02:28 AM, Miloš Kozák wrote:
>>>>>>> Hi,
>>>>>>> I am running some test on top of v3.5.1 in my 2 nodes configuration
>>>>>>> with one disk each and replica 2 mode.
>>>>>>>
>>>>>>> I have two servers connected by a cable. Through this cable I let
>>>>>>> glusterd communicate. I start dd to create a relatively large
>>>>>>> file. In
>>>>>>> the middle of writing process I disconnect the cable, so on one
>>>>>>> server
>>>>>>> (node1) I can see all data and on the other one (node2) I can see
>>>>>>> just
>>>>>>> a split of the file when writing is finished
>>>>>>
>>>>>> Does this mean your client (mount point) is also on node 1?
>>>>>
>>>>> Yes I mounted volume on both servers as follows:
>>>>> localhost:vg0    /mnt
>>>>>
>>>>>>> .. no surprise so far.
>>>>>>>
>>>>>>> Then I put the cable back. After a while peers are discovered,
>>>>>>> self-healing daemons start to communicate, so I can see:
>>>>>>>
>>>>>>> gluster volume heal vg0 info
>>>>>>> Brick node1:/dist1/brick/fs/
>>>>>>> /node-middle - Possibly undergoing heal
>>>>>>> Number of entries: 1
>>>>>>>
>>>>>>> Brick node2:/dist1/brick/fs/
>>>>>>> /node-middle - Possibly undergoing heal
>>>>>>> Number of entries: 1
>>>>>>>
>>>>>>> But on the network there are no data moving, which I verify by df..
>>>>>>>
>>>>>> When  you get "Possibly undergoing heal" and no I/O is going on
>>>>>> from the
>>>>>> client, it means the self-heal daemon is healing the file. Can you
>>>>>> check
>>>>>> if there are  messages in glustershd.log of node1 about self-heal
>>>>>> completion ?
>>>>>
>>>>> There are no lines in log, that is the reason why I wrote this email
>>>>> eventually.
>>>>>
>>>>>>> Any help? In my opinion after a while I should get my nodes
>>>>>>> synchronized, but after 20minuts of waiting still nothing (the file
>>>>>>> was 2G big)
>>>>>> Does gluster volume status show all processes being online?
>>>>>
>>>>> All processes are running.
>>>>>
>>>>
>>>> Output of strace -f -p <self-heal-daemon pid> from both nodes might
>>>> also help.
>>>>
>>>> Thanks,
>>>> Vijay
>>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>



More information about the Gluster-users mailing list