[Gluster-users] How to Speed UP heal process in Glusterfs 3.10.1

Fri Apr 21 09:41:24 UTC 2017

On Thu, Apr 20, 2017 at 1:24 PM, Amudhan P <amudhan83 at gmail.com> wrote:

> Hi Pranith,
>
> > 1) At the moment heals happen in parallel only for files not
> directories. i.e. same shd process doesn't heal 2 directories at a time.
> But it   > can do as many file heals as shd-max-threads option. That could
> be the reason why Amudhan faced better performance after a while, but > it
> is a bit difficult to confirm without data.
>
>        yes, your right disk has about 56153 files and each is under their
> own subdirectories. so equal or higher number folders will be there.
>
> I have doubt when heal process creates a folder in disk does it also check
> with rest of the bricks on same disperse set to process and update
> xattr for folders and files when getting healed.
>

Yes in general most of the heal process involves contacting other bricks
not just for creating directory but for other things as well like setting
inode attributes/xattrs, data etc.

>
> > 2) When a file is undergoing I/O both shd and mount will contend for
> locks to do I/O from bricks this probably is the reason for the           >
> slowness in I/O. it will last only until the file is healed in parallel
> with the I/O from users.
>
>        I suggest there should be a mechanism in above case that should
> pause heal process and fulfill read request first and later continue with
> heal process. so user doesn't feel any difference in read speed.
>

But the read request can come at any point. If READ request comes after
heal process takes locks, then the logic will become very convoluted to
give priority to I/O. I think a better way would be to disable I/O from
triggering heals for your case. This doesn't really fix the problem but it
would reduce the probability of seeing this issue.

>
> >3) Serkan, Amudhan, it would be nice to have feedback about what do you
> feel are the bottlenecks so that we can come up with next set >of
> performance improvements. One of the newer enhancements Sunil is working on
> is to be able to heal larger chunks in one go rather >than ~128KB chunks.
> It will be configurable upto 128MB I think, this will improve throughput.
> Next set of enhancements would >concentrate on reducing network round trips
> in doing heal and doing parallel heals of directories.
>
>         I don't see any other bottlenecks other than what we discussed in
> this thread. heal should be faster when we have sufficient hardware power
> to do that. hope the newer enhancements would fulfill.
>
>
> Coming to the original thread:
>
> I think heal process is completed but still, there is a size difference of
> 14GB between healed disk and other good disks in the same set.
> so I have compared files between healed disk and good disk there are 3
> files missing but it is a kb size files and this file was deleted in 3.7
> but it's still in bricks.
>

Oh you have 3 files missing but no xattrs to indicate this? Could you let
us know more about what are the parent directory xattrs on all the bricks
where the file is missing?

>
> Why is this size difference?
>

Could you find which files/directories are corresponding to the size
difference? Also include .glusterfs in your commands for consideration.

>
> regards
> Amudhan P
>
>
>
> On Wed, Apr 19, 2017 at 4:05 PM, Pranith Kumar Karampuri <
> pkarampu at redhat.com> wrote:
>
>> Some thoughts based on this mail thread:
>> 1) At the moment heals happen in parallel only for files not directories.
>> i.e. same shd process doesn't heal 2 directories at a time. But it can do
>> as many file heals as shd-max-threads option. That could be the reason why
>> Amudhan faced better performance after a while, but it is a bit difficult
>> to confirm without data.
>>
>> 2) When a file is undergoing I/O both shd and mount will contend for
>> locks to do I/O from bricks this probably is the reason for the slowness in
>> I/O. it will last only until the file is healed in parallel with the I/O
>> from users.
>>
>> 3) Serkan, Amudhan, it would be nice to have feedback about what do you
>> feel are the bottlenecks so that we can come up with next set of
>> performance improvements. One of the newer enhancements Sunil is working on
>> is to be able to heal larger chunks in one go rather than ~128KB chunks. It
>> will be configurable upto 128MB I think, this will improve throughput. Next
>> set of enhancements would concentrate on reducing network round trips in
>> doing heal and doing parallel heals of directories.
>>
>>
>> On Tue, Apr 18, 2017 at 6:22 PM, Serkan Çoban <cobanserkan at gmail.com>
>> wrote:
>>
>>> >Is this by design ? Is it tuneable ? 10MB/s/brick is too low for us.
>>> >We will use 10GB ethernet, healing 10MB/s/brick would be a bottleneck.
>>>
>>> That is the maximum if you are using EC volumes, I don't know about
>>> other volume configurations.
>>> With 3.9.0 parallel self heal of EC volumes should be faster though.
>>>
>>>
>>>
>>> On Tue, Apr 18, 2017 at 1:38 PM, Gandalf Corvotempesta
>>> <gandalf.corvotempesta at gmail.com> wrote:
>>> > 2017-04-18 9:36 GMT+02:00 Serkan Çoban <cobanserkan at gmail.com>:
>>> >> Nope, healing speed is 10MB/sec/brick, each brick heals with this
>>> >> speed, so one brick or one server each will heal in one week...
>>> >
>>> > Is this by design ? Is it tuneable ? 10MB/s/brick is too low for us.
>>> > We will use 10GB ethernet, healing 10MB/s/brick would be a bottleneck.
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>> --
>> Pranith
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170421/d33334af/attachment.html>