[Gluster-users] Very poor heal behaviour in 3.7.9

Mon Mar 28 01:08:12 UTC 2016

On 27/03/2016 12:33 AM, Lindsay Mathieson wrote:
> On 26/03/2016 11:58 PM, Pranith Kumar Karampuri wrote:
>>> Is that the same issue I posted earlier re "gluster volume heal 
>>> info" appearing to block I/O?
>>>
>> I don't think it is heal info that is blocking I/O. I think it is 
>> client triggering heal and block the fop until heal completes that 
>> results in this pattern. This data-heal disabling should get you out 
>> of this problem. 
>
>
> I tried it earlier and it didn't seem to help.
>
> Does anything need to be restarted after cluster.data-self-heal is set 
> off?

Tried again this morning. 100% replicate the behaviour I noted in

> After testing the heal process by killing glusterfsd on a node I 
> noticed the following.
>
> - I/O continued at normal speed while glusterfsd was down.
>
> - After restarting glusterfsd, I/O still continued as normal
>
> - performing a "gluster volume heal datastore2 info" whould show some 
> info then hang.
>
> - I/O on the cluster would cease. e.g in a VM where I was running a 
> command line build of a large project, the build just stopped. The VM 
> itself was mostly responsive but anything that involved accessing the 
> disk hung.
>
> - if I killed the "gluster volume heal datastore2 info" command then 
> I/O in the VM's resumed at a normal pace.
>
> - if I then reissued the "gluster volume heal datastore2 info" command 
> I/O would continue for a short while (seconds - minutes) before 
> hanging again.
>
> - killing the heal info command would resume I/O again.

iowait and cpu are under 4% on all three nodes.

Even after I shutdown all vm's on datastore2 "gluster volume heal 
datastore2 info" hung indefinitely with no output.

I had to stop/start the datastore2 before the info would work, it 
rteurned very quickly with:

    Brick vnb.proxmox.softlog:/tank/vmdata/datastore2
    Number of entries: 0

    Brick vng.proxmox.softlog:/tank/vmdata/datastore2
    /.shard - Possibly undergoing heal

    Number of entries: 1

    Brick vna.proxmox.softlog:/tank/vmdata/datastore2
    /.shard - Possibly undergoing heal

    Number of entries: 1

Unfortunately its stayed that way for 10 minutes now.

I'd like to recheck this behaviour under 3.7.7 - can I just revert to 
that (debian packages) without recreating the datastore?

thanks,

-- 
Lindsay Mathieson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160328/3db31634/attachment.html>