[Gluster-devel] readdir() scalability (was Re: [RFC ] dictionary optimizations)

Thu Sep 12 11:17:50 UTC 2013

On 09/12/2013 06:08 AM, Xavier Hernandez wrote:
> Al 09/09/13 17:25, En/na Vijay Bellur ha escrit:
>> On 09/09/2013 02:18 PM, Xavier Hernandez wrote:
>>> Al 06/09/13 20:43, En/na Anand Avati ha escrit:
>>>>
>>>> On Fri, Sep 6, 2013 at 1:46 AM, Xavier Hernandez
>>>> <xhernandez at datalab.es <mailto:xhernandez at datalab.es>> wrote:
>>>>
>>>>     Al 04/09/13 18:10, En/na Anand Avati ha escrit:
>>>>>     On Wed, Sep 4, 2013 at 6:37 AM, Xavier Hernandez
>>>>>     <xhernandez at datalab.es <mailto:xhernandez at datalab.es>> wrote:
>>>>>
>>>>>         Al 04/09/13 14:05, En/na Jeff Darcy ha escrit:
>>>>>
>>>>>             On 09/04/2013 04:27 AM, Xavier Hernandez wrote:
>>>>>
...
>>
>> Have you tried turning on "cluster.readdir-optimize"? This could help
>> improve readdir performance for the directory hierarchy that you
>> describe.
>>
> I repeated the tests with this option enabled and it really improved
> readdir performance, however it still shows a linear speed loss as the
> number of bricks increases. Will the readdir-ahead translator be able to
> hide this linear effect when the number of bricks is very high ?
> 

I don't know that it will change the overall effect, but perhaps it
could smooth things out (or if not, we can see about further
improvements). Could you try it out and let us know? :)

Brian

> Results of the tests with cluser.readdir-optimize active:
> 
> 1 brick: 11.8 seconds
> 2 bricks: 15.4 seconds
> 3 bricks: 17.9 seconds
> 4 bricks: 20.6 seconds
> 5 bricks: 22.9 seconds
> 6 bricks: 25.4 seconds
> 12 bricks: 41.8 seconds
> 
> Rebalance also improved:
> 
> From 1 to 2 bricks: 77 seconds
> From 2 to 3 bricks: 78 seconds
> From 3 to 4 bricks: 81 seconds
> From 4 to 5 bricks: 84 seconds
> From 5 to 6 bricks: 87 seconds
> From 6 to 12 bricks: 144 seconds
> 
> Xavi
> 
>> -Vijay
>>
>>
>>>
>>> After each test, I added a new brick and started a rebalance. Once the
>>> rebalance was completed, I umounted and stopped the volume and restarted
>>> it again.
>>>
>>> The test consisted of 4 'time ls -lR /<testdir> | wc -l'. The first
>>> result was discarded. The result shown below is the mean of the other 3
>>> results.
>>>
>>> 1 brick: 11.8 seconds
>>> 2 bricks: 19.0 seconds
>>> 3 bricks: 23.8 seconds
>>> 4 bricks: 29.8 seconds
>>> 5 bricks: 34.6 seconds
>>> 6 bricks: 41.0 seconds
>>> 12 bricks (2 bricks on each server): 78.5 seconds
>>>
>>> The rebalancing time also grew considerably (these times are the result
>>> of a single rebalance. They might not be very accurate):
>>>
>>>  From 1 to 2 bricks: 91 seconds
>>>  From 2 to 3 bricks: 102 seconds
>>>  From 3 to 4 bricks: 119 seconds
>>>  From 4 to 5 bricks: 138 seconds
>>>  From 5 to 6 bricks: 151 seconds
>>>  From 6 to 12 bricks: 259 seconds
>>>
>>> The number of disk IOPS didn't exceed 40 in any server in any case. The
>>> network bandwidth didn't go beyond 6 Mbits/s between any pair of servers
>>> and none of them reached 100% core usage.
>>>
>>> Xavi
>>>
>>>> Avati
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at nongnu.org
>>>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at nongnu.org
>>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel