[Gluster-devel] readdir() scalability (was Re: [RFC ] dictionary optimizations)
Xavier Hernandez
xhernandez at datalab.es
Fri Sep 13 08:05:46 UTC 2013
Al 12/09/13 13:17, En/na Brian Foster ha escrit:
> On 09/12/2013 06:08 AM, Xavier Hernandez wrote:
>> Al 09/09/13 17:25, En/na Vijay Bellur ha escrit:
>>> On 09/09/2013 02:18 PM, Xavier Hernandez wrote:
>>>> Al 06/09/13 20:43, En/na Anand Avati ha escrit:
>>>>> On Fri, Sep 6, 2013 at 1:46 AM, Xavier Hernandez
>>>>> <xhernandez at datalab.es <mailto:xhernandez at datalab.es>> wrote:
>>>>>
>>>>> Al 04/09/13 18:10, En/na Anand Avati ha escrit:
>>>>>> On Wed, Sep 4, 2013 at 6:37 AM, Xavier Hernandez
>>>>>> <xhernandez at datalab.es <mailto:xhernandez at datalab.es>> wrote:
>>>>>>
>>>>>> Al 04/09/13 14:05, En/na Jeff Darcy ha escrit:
>>>>>>
>>>>>> On 09/04/2013 04:27 AM, Xavier Hernandez wrote:
>>>>>>
> ...
>>> Have you tried turning on "cluster.readdir-optimize"? This could help
>>> improve readdir performance for the directory hierarchy that you
>>> describe.
>>>
>> I repeated the tests with this option enabled and it really improved
>> readdir performance, however it still shows a linear speed loss as the
>> number of bricks increases. Will the readdir-ahead translator be able to
>> hide this linear effect when the number of bricks is very high ?
>>
> I don't know that it will change the overall effect, but perhaps it
> could smooth things out (or if not, we can see about further
> improvements). Could you try it out and let us know? :)
I've repeated the tests using master branch (commit 643533c7), combining
cluster.readdir-optimize and performance.readdir-ahead. These are the
results:
Configurations
Test1: cluster.readdir-optimize=off and performance.readdir-ahead=off
Test2: cluster.readdir-optimize=on and performance.readdir-ahead=off
Test3: cluster.readdir-optimize=off and performance.readdir-ahead=on
Test4: cluster.readdir-optimize=on and performance.readdir-ahead=on
ls: average time to complete 3 'ls -lR <mount root> | wc -l'
(a previous ls is made to fill the caches)
rb: rebalance time (not averaged, only done once)
Bricks Test1 Test2 Test3 Test4
ls rb ls rb ls rb ls rb
1 10.7 -- 10.6 -- 9.8 -- 9.8 --
2 18.7 82 14.1 84 17.1 83 13.5 82
3 24.6 83 16.8 84 23.1 84 16.4 85
4 30.2 87 19.7 86 29.0 88 19.2 87
5 36.0 92 22.5 90 34.8 91 21.7 91
6 42.2 97 25.1 96 40.9 95 24.1 96
12 80.4 161 42.1 160 81.3 162 41.5 162
It seems that the benefit is minimal when only considering the directory
structure.
Xavi
> Brian
>
>> Results of the tests with cluser.readdir-optimize active:
>>
>> 1 brick: 11.8 seconds
>> 2 bricks: 15.4 seconds
>> 3 bricks: 17.9 seconds
>> 4 bricks: 20.6 seconds
>> 5 bricks: 22.9 seconds
>> 6 bricks: 25.4 seconds
>> 12 bricks: 41.8 seconds
>>
>> Rebalance also improved:
>>
>> From 1 to 2 bricks: 77 seconds
>> From 2 to 3 bricks: 78 seconds
>> From 3 to 4 bricks: 81 seconds
>> From 4 to 5 bricks: 84 seconds
>> From 5 to 6 bricks: 87 seconds
>> From 6 to 12 bricks: 144 seconds
>>
>> Xavi
>>
>>> -Vijay
>>>
>>>
>>>> After each test, I added a new brick and started a rebalance. Once the
>>>> rebalance was completed, I umounted and stopped the volume and restarted
>>>> it again.
>>>>
>>>> The test consisted of 4 'time ls -lR /<testdir> | wc -l'. The first
>>>> result was discarded. The result shown below is the mean of the other 3
>>>> results.
>>>>
>>>> 1 brick: 11.8 seconds
>>>> 2 bricks: 19.0 seconds
>>>> 3 bricks: 23.8 seconds
>>>> 4 bricks: 29.8 seconds
>>>> 5 bricks: 34.6 seconds
>>>> 6 bricks: 41.0 seconds
>>>> 12 bricks (2 bricks on each server): 78.5 seconds
>>>>
>>>> The rebalancing time also grew considerably (these times are the result
>>>> of a single rebalance. They might not be very accurate):
>>>>
>>>> From 1 to 2 bricks: 91 seconds
>>>> From 2 to 3 bricks: 102 seconds
>>>> From 3 to 4 bricks: 119 seconds
>>>> From 4 to 5 bricks: 138 seconds
>>>> From 5 to 6 bricks: 151 seconds
>>>> From 6 to 12 bricks: 259 seconds
>>>>
>>>> The number of disk IOPS didn't exceed 40 in any server in any case. The
>>>> network bandwidth didn't go beyond 6 Mbits/s between any pair of servers
>>>> and none of them reached 100% core usage.
>>>>
>>>> Xavi
>>>>
>>>>> Avati
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-devel mailing list
>>>>> Gluster-devel at nongnu.org
>>>>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at nongnu.org
>>>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at nongnu.org
>>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list