[Gluster-devel] Fwd: Feature: Rebalance completion time estimation

Thu Nov 17 14:49:56 UTC 2016

On 14 November 2016 at 05:10, Shyam <srangana at redhat.com> wrote:

> On 11/11/2016 05:46 AM, Susant Palai wrote:
>
>> Hello All,
>>    We have been receiving many requests from users to give a "Rebalance
>> completion time estimation". This email is to gather ideas and feedback
>> from the community for the same. We have one proposal, but nothing is
>> concrete. Please feel free to give your input for this problem.
>>
>> A brief about rebalance operation:
>> - Rebalance process is used to rebalance data across cluster most likely
>> in the event of add-brick and remove-brick. Rebalance is spawned on each
>> node. The job for the process is to read directories, fix it's layout to
>> include the newly added brick. Read children files(only those reside on
>> local bricks) of the directory and migrate them if necessary decided by the
>> new layout.
>>
>>
>> Here is one of the solution pitched by Manoj Pillai.
>>
>> Assumptions for this idea:
>>  - files are of similar size.
>>  - Max 40% of the total files will be migrated
>>
>> 1- Do a statfs on the local bricks. Say the total size is St.
>>
>
> Why not use the f_files from statfs that shows inode count and use that
> and possibly f_ffree, to determine how many inodes are there, and then use
> the crawl, to figure out how many we have visited and how many are pending
> to determine rebalance progress.
>
> I am not sure if the local FS (XFS say) fills up this data for use, but if
> it does, then it may provide a better estimate.
>
>
>
>> Thanks Shyam, that is a good idea.
I tried out  a very rough version of this. The statfs does return the inode
info (available and used) on my XFS brick. However those numbers are thrown
way off by the entries in the .glusterfs directory.  In my very limited
file only dataset, there were almost twice as many inodes in use as there
were files in the volume.  I am yet to try out the results with a directory
heavy data set.

High level algorithm:

1. When rebalance starts up, get the estimated number of files on the brick
using the statfs inode count.
2. As rebalance proceeds, calculate the rate at which files are being
looked up. This is based on the assumption that a rebalance cannot complete
until the filesystem crawl is complete. Actual file migration operations do
not seem to contribute greatly to this time but that still needs to be
validated with more realistic data sets.
3. Using the calculated rate and the estimated number of files, calculate
the time it would take to process all the files on the brick.  That would
be our estimate for how long rebalance would take to complete on that node.

Things to be considered/assumptions:

1. A single filesystem partition contains a single brick in order for the
statfs info to be valid
2. My test was run on a single brick volume to which I added another brick
and started rebalance. More nodes and bricks in the cluster would mean that
the total number of files might change more frequently as files are not
just moved off the brick but to it as well.

That being said, the initial results are encouraging. The estimates
generated were fairly close to the times actually taken. The estimates are
generated every time the
gluster v rebalance <vol> status

command is run and the values autocorrect to take the latest data into
consideration. However, mine was a limited setup and most rebalance runs
took around 10 mins or so. It would be interesting to see the numbers for
larger data sets where rebalance takes days or weeks.

Regards,
Nithya

>>
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20161117/b53a8919/attachment.html>