[Gluster-devel] [RFC ] dictionary optimizations

Tue Sep 3 07:33:08 UTC 2013

On Mon, Sep 2, 2013 at 7:24 AM, Xavier Hernandez <xhernandez at datalab.es>wrote:

> Hi,
>
> dict_t structures are widely used in glusterfs. I've some ideas that could
> improve its performance.
>
> * On delete operations, return the current value if it exists.
>
> This is very useful when we want to get a value and remove it from the
> dictionary. This way it can be done accessing and locking the dict_t only
> once (and it is atomic).
>

Makes sense.

> * On add operations, return the previous value if it existed.
>
> This avoids to use a lookup and a conditional add (and it is atomic).
>

Do you mean dict_set()? If so, how do you propose we differentiate between
"failure" and "previous value did not exist"? Do you propose setting the
previous value into a pointer to pointer, and retain the return value as is
today?

> * Always return the data_pair_t structure instead of data_t or the data
> itself.
>
> This can be useful to avoid future lookups or other operations on the same
> element. Macros can be created to simplify writing code to access the
> actual value.
>

The use case is not clear. A more concrete example will help..

> * Use a trie instead of a hash.
>
> A trie structure is a bit more complex than a hash, but only processes the
> key once and does not need to compute the hash. A test implementation I
> made with a trie shows a significant improvement in dictionary operations.
>

There is already an implementation of trie in libglusterfs/src/trie.c.
Though it does not compact (collapse) single-child nodes upwards into the
parent. In any case, let's avoid having two implementations of tries.

> * Implement dict_foreach() as a macro (similar to kernel's
> list_for_each()).
>
> This gives more control and avoids the need of helper functions.
>

This makes sense too, but there are quite a few users of dict_foreach in
the existing style. Moving them all over might be a pain.

> Additionally, I think it's possible to redefine structures to reduce the
> number of allocations and pointers used for each element (actual data,
> data_t, data_pair_t and key).
>

This is highly desirable. There was some effort from Amar in the past (
http://review.gluster.org/3910) but it has been in need of attention for
some time. It would be intersting to know if you were thinking along
similar lines?

Avati

What do you think ?
>
> Best regards,
>
> Xavi
>
> ______________________________**_________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/**mailman/listinfo/gluster-devel<https://lists.nongnu.org/mailman/listinfo/gluster-devel>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130903/78ec10bf/attachment-0001.html>