[Gluster-devel] Query regarding dictionary logic

Fri May 3 05:44:39 UTC 2019

Hi Mohit,

Thank you for the update. More inline.

On Wed, May 1, 2019 at 11:45 PM Mohit Agrawal <moagrawa at redhat.com> wrote:

> Hi Vijay,
>
> I have tried to execute smallfile tool on volume(12x3), i have not found
> any significant performance improvement
> for smallfile operations, I have configured 4 clients and 8 thread to run
> operations.
>

For measuring performance, did you measure both time taken and cpu
consumed? Normally O(n) computations are cpu expensive and we might see
better results with a hash table when a large number of objects ( a few
thousands) are present in a single dictionary. If you haven't gathered cpu
statistics, please also gather that for comparison.

> I have generated statedump and found below data for dictionaries specific
> to gluster processes
>
> brick
> max-pairs-per-dict=50
> total-pairs-used=192212171
> total-dicts-used=24794349
> average-pairs-per-dict=7
>
>
> glusterd
> max-pairs-per-dict=301
> total-pairs-used=156677
> total-dicts-used=30719
> average-pairs-per-dict=5
>
>
> fuse process
> [dict]
> max-pairs-per-dict=50
> total-pairs-used=88669561
> total-dicts-used=12360543
> average-pairs-per-dict=7
>
> It seems dictionary has max-pairs in case of glusterd and while no. of
> volumes are high the number can be increased.
> I think there is no performance regression in case of brick and fuse. I
> have used hash_size 20 for the dictionary.
> Let me know if you can provide some other test to validate the same.
>

A few more items to try out:

1. Vary the number of buckets and test.
2. Create about 10000 volumes and measure performance for a volume info
<volname> operation on some random volume?
3. Check the related patch from Facebook and see if we can incorporate any
ideas from their patch.

Thanks,
Vijay

> Thanks,
> Mohit Agrawal
>
> On Tue, Apr 30, 2019 at 2:29 PM Mohit Agrawal <moagrawa at redhat.com> wrote:
>
>> Thanks, Amar for sharing the patch, I will test and share the result.
>>
>> On Tue, Apr 30, 2019 at 2:23 PM Amar Tumballi Suryanarayan <
>> atumball at redhat.com> wrote:
>>
>>> Shreyas/Kevin tried to address it some time back using
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1428049 (
>>> https://review.gluster.org/16830)
>>>
>>> I vaguely remember the reason to keep the hash value 1 was done during
>>> the time when we had dictionary itself sent as on wire protocol, and in
>>> most other places, number of entries in dictionary was on an avg, 3. So, we
>>> felt, saving on a bit of memory for optimization was better at that time.
>>>
>>> -Amar
>>>
>>> On Tue, Apr 30, 2019 at 12:02 PM Mohit Agrawal <moagrawa at redhat.com>
>>> wrote:
>>>
>>>> sure Vijay, I will try and update.
>>>>
>>>> Regards,
>>>> Mohit Agrawal
>>>>
>>>> On Tue, Apr 30, 2019 at 11:44 AM Vijay Bellur <vbellur at redhat.com>
>>>> wrote:
>>>>
>>>>> Hi Mohit,
>>>>>
>>>>> On Mon, Apr 29, 2019 at 7:15 AM Mohit Agrawal <moagrawa at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>>   I was just looking at the code of dict, I have one query current
>>>>>> dictionary logic.
>>>>>>   I am not able to understand why we use hash_size is 1 for a
>>>>>> dictionary.IMO with the
>>>>>>   hash_size of 1 dictionary always work like a list, not a hash, for
>>>>>> every lookup
>>>>>>   in dictionary complexity is O(n).
>>>>>>
>>>>>>   Before optimizing the code I just want to know what was the exact
>>>>>> reason to define
>>>>>>   hash_size is 1?
>>>>>>
>>>>>
>>>>> This is a good question. I looked up the source in gluster's historic
>>>>> repo [1] and hash_size is 1 even there. So, this could have been the case
>>>>> since the first version of the dictionary code.
>>>>>
>>>>> Would you be able to run some tests with a larger hash_size and share
>>>>> your observations?
>>>>>
>>>>> Thanks,
>>>>> Vijay
>>>>>
>>>>> [1]
>>>>> https://github.com/gluster/historic/blob/master/libglusterfs/src/dict.c
>>>>>
>>>>>
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20190502/efdb7709/attachment.html>