[Gluster-users] How to Speed UP heal process in Glusterfs 3.10.1

Thu Apr 20 07:54:29 UTC 2017

Hi Pranith,

> 1) At the moment heals happen in parallel only for files not directories.
i.e. same shd process doesn't heal 2 directories at a time. But it   > can
do as many file heals as shd-max-threads option. That could be the reason
why Amudhan faced better performance after a while, but > it is a bit
difficult to confirm without data.

       yes, your right disk has about 56153 files and each is under their
own subdirectories. so equal or higher number folders will be there.

I have doubt when heal process creates a folder in disk does it also check
with rest of the bricks on same disperse set to process and update
xattr for folders and files when getting healed.

> 2) When a file is undergoing I/O both shd and mount will contend for
locks to do I/O from bricks this probably is the reason for the           >
slowness in I/O. it will last only until the file is healed in parallel
with the I/O from users.

       I suggest there should be a mechanism in above case that should
pause heal process and fulfill read request first and later continue with
heal process. so user doesn't feel any difference in read speed.

>3) Serkan, Amudhan, it would be nice to have feedback about what do you
feel are the bottlenecks so that we can come up with next set >of
performance improvements. One of the newer enhancements Sunil is working on
is to be able to heal larger chunks in one go rather >than ~128KB chunks.
It will be configurable upto 128MB I think, this will improve throughput.
Next set of enhancements would >concentrate on reducing network round trips
in doing heal and doing parallel heals of directories.

        I don't see any other bottlenecks other than what we discussed in
this thread. heal should be faster when we have sufficient hardware power
to do that. hope the newer enhancements would fulfill.

Coming to the original thread:

I think heal process is completed but still, there is a size difference of
14GB between healed disk and other good disks in the same set.
so I have compared files between healed disk and good disk there are 3
files missing but it is a kb size files and this file was deleted in 3.7
but it's still in bricks.

Why is this size difference?

regards
Amudhan P

On Wed, Apr 19, 2017 at 4:05 PM, Pranith Kumar Karampuri <
pkarampu at redhat.com> wrote:

> Some thoughts based on this mail thread:
> 1) At the moment heals happen in parallel only for files not directories.
> i.e. same shd process doesn't heal 2 directories at a time. But it can do
> as many file heals as shd-max-threads option. That could be the reason why
> Amudhan faced better performance after a while, but it is a bit difficult
> to confirm without data.
>
> 2) When a file is undergoing I/O both shd and mount will contend for locks
> to do I/O from bricks this probably is the reason for the slowness in I/O.
> it will last only until the file is healed in parallel with the I/O from
> users.
>
> 3) Serkan, Amudhan, it would be nice to have feedback about what do you
> feel are the bottlenecks so that we can come up with next set of
> performance improvements. One of the newer enhancements Sunil is working on
> is to be able to heal larger chunks in one go rather than ~128KB chunks. It
> will be configurable upto 128MB I think, this will improve throughput. Next
> set of enhancements would concentrate on reducing network round trips in
> doing heal and doing parallel heals of directories.
>
>
> On Tue, Apr 18, 2017 at 6:22 PM, Serkan Çoban <cobanserkan at gmail.com>
> wrote:
>
>> >Is this by design ? Is it tuneable ? 10MB/s/brick is too low for us.
>> >We will use 10GB ethernet, healing 10MB/s/brick would be a bottleneck.
>>
>> That is the maximum if you are using EC volumes, I don't know about
>> other volume configurations.
>> With 3.9.0 parallel self heal of EC volumes should be faster though.
>>
>>
>>
>> On Tue, Apr 18, 2017 at 1:38 PM, Gandalf Corvotempesta
>> <gandalf.corvotempesta at gmail.com> wrote:
>> > 2017-04-18 9:36 GMT+02:00 Serkan Çoban <cobanserkan at gmail.com>:
>> >> Nope, healing speed is 10MB/sec/brick, each brick heals with this
>> >> speed, so one brick or one server each will heal in one week...
>> >
>> > Is this by design ? Is it tuneable ? 10MB/s/brick is too low for us.
>> > We will use 10GB ethernet, healing 10MB/s/brick would be a bottleneck.
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> --
> Pranith
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170420/7e294c60/attachment.html>