[Gluster-devel] [RFC ] dictionary optimizations

Wed Sep 4 00:55:06 UTC 2013

On Tue, Sep 3, 2013 at 1:42 AM, Xavier Hernandez <xhernandez at datalab.es>wrote:

>  Al 03/09/13 09:33, En/na Anand Avati ha escrit:
>
> On Mon, Sep 2, 2013 at 7:24 AM, Xavier Hernandez <xhernandez at datalab.es>wrote:
>
>> Hi,
>>
>> dict_t structures are widely used in glusterfs. I've some ideas that
>> could improve its performance.
>>
>> * On delete operations, return the current value if it exists.
>>
>> This is very useful when we want to get a value and remove it from the
>> dictionary. This way it can be done accessing and locking the dict_t only
>> once (and it is atomic).
>>
>
>  Makes sense.
>
>
>
>> * On add operations, return the previous value if it existed.
>>
>> This avoids to use a lookup and a conditional add (and it is atomic).
>>
>
>  Do you mean dict_set()? If so, how do you propose we differentiate
> between "failure" and "previous value did not exist"? Do you propose
> setting the previous value into a pointer to pointer, and retain the return
> value as is today?
>
> Yes, I'm thinking to something similar to dict_set() (by the way, I would
> remove the dict_add() function).
>

dict_add() is used in unserialization routines where dict_set() for a big
set of keys guaranteed not to repeat is very expensive (unserializing would
otherwise have a quadratic function as its asymptote). What is the reason
you intend to remove it?

> What you propose would be the simplest solution right now. However I think
> it would be interesting to change the return value to an error code (this
> would supply more detailed information in case of failure and we could use
> EEXIST to know if the value already existed. In fact I think it would be
> interesting to progressively change the -1 return code of many functions by
> an error code). The pointer to pointer argument could be NULL if the
> previous value is not needed.
>
> Of course this would change the function signature, breaking a lot of
> existing code. Another possibility could be to create a dict_replace()
> function, and possibly make it to fail if the value didn't exist.
>

It is best we do not change the meaning of existing APIs, and just add new
APIs instead. The new API can be:

int dict_replace (dict_t *dict, const char *key, data_t *newval, data_t
**oldval);

.. and leave dict_set() as is.

>
>
>
>> * Always return the data_pair_t structure instead of data_t or the data
>> itself.
>>
>> This can be useful to avoid future lookups or other operations on the
>> same element. Macros can be created to simplify writing code to access the
>> actual value.
>>
>
>  The use case is not clear. A more concrete example will help..
>
>    Having a data_pair_t could help to navigate from an existing element
> (getting next or previous. This is really interesting if dict where
> implemented using a sorted structure like a trie since it would allow to
> process a set of similar entries very fast, like the trusted.afr.<brick>
> values for example) or removing or replacing it without needing another
> lookup (a more detailed analysis would be needed to see how to handle race
> conditions).
>
> By the way, is really the dict_t structure used concurrently ? I haven't
> analyzed all the code deeply, but it seems to me that every dict_t is only
> accessed from a single place at once.
>

There have been instances of dict_t getting used concurrently, when used as
xdata and in xattrop (by AFR). There have been bugs in the past with
concurrent dict access.

>
>
>> * Use a trie instead of a hash.
>>
>> A trie structure is a bit more complex than a hash, but only processes
>> the key once and does not need to compute the hash. A test implementation I
>> made with a trie shows a significant improvement in dictionary operations.
>>
>
>  There is already an implementation of trie in libglusterfs/src/trie.c.
> Though it does not compact (collapse) single-child nodes upwards into the
> parent. In any case, let's avoid having two implementations of tries.
>
> I know. The current implementation wastes a lot of memory because it uses
> an array of 256 pointers, and in some places it needs to traverse the
> array. Not a b¡g deal, but if it is made many times it could be noticeable.
> In my test I used a trie with 4 child pointers (with collapsing
> single-child nodes) that runs a bit faster than the 256 implementation and
> uses much less memory. I tried with 2, 4, 16 and 256 childs per node, and 4
> seems to be the best (at least for dictionary structures) though there are
> very little difference between 4 and 16 in terms of speed.
>

The 256 child pointers give you constant time lookup for the next level
child with just an offset indirection. With smaller fan-out, do you search
through the list? Can you show an example of this? Collapsing single child
node upwards is badly needed though.

> I agree that it is not good to maintain two implementations of the same
> thing. Maybe we could change the trie implementation. It should be
> transparent.
>

Yes, I believe the current API can accommodate such internal changes.

>
>
>    * Implement dict_foreach() as a macro (similar to kernel's
>> list_for_each()).
>>
>> This gives more control and avoids the need of helper functions.
>>
>
>  This makes sense too, but there are quite a few users of dict_foreach in
> the existing style. Moving them all over might be a pain.
>
> Maybe we could create a differently named macro to implement this feature
> and allow the developers to slowly change it. The old implementation could
> be flagged as deprecated and use the new one for new code. Old code will
> have enough time to change it until eventually the old implementation is
> removed.
>
> If we make important changes to the dict_t structure, we could replace
> current functions by macros that use the new implementation but simulates
> the old behavior.
>

Sounds OK.

>
>
>
>> Additionally, I think it's possible to redefine structures to reduce the
>> number of allocations and pointers used for each element (actual data,
>> data_t, data_pair_t and key).
>>
>
>  This is highly desirable. There was some effort from Amar in the past (
> http://review.gluster.org/3910) but it has been in need of attention for
> some time. It would be intersting to know if you were thinking along
> similar lines?
>
>    Yes, it is quite similar though I should analyze it more deeply. I
> would also try to remove some unused/unneeded fields that are used in very
> few places, add complexity and can be replaced easily, like extra_free and
> extra_stdfree in dict_t for example.
>
>
Thanks,
Avati
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130903/d4249b57/attachment-0001.html>