[Gluster-devel] [RFC ] dictionary optimizations

Tue Sep 3 08:42:38 UTC 2013

Al 03/09/13 09:33, En/na Anand Avati ha escrit:
> On Mon, Sep 2, 2013 at 7:24 AM, Xavier Hernandez 
> <xhernandez at datalab.es <mailto:xhernandez at datalab.es>> wrote:
>
>     Hi,
>
>     dict_t structures are widely used in glusterfs. I've some ideas
>     that could improve its performance.
>
>     * On delete operations, return the current value if it exists.
>
>     This is very useful when we want to get a value and remove it from
>     the dictionary. This way it can be done accessing and locking the
>     dict_t only once (and it is atomic).
>
>
> Makes sense.
>
>     * On add operations, return the previous value if it existed.
>
>     This avoids to use a lookup and a conditional add (and it is atomic).
>
>
> Do you mean dict_set()? If so, how do you propose we differentiate 
> between "failure" and "previous value did not exist"? Do you propose 
> setting the previous value into a pointer to pointer, and retain the 
> return value as is today?
Yes, I'm thinking to something similar to dict_set() (by the way, I 
would remove the dict_add() function). What you propose would be the 
simplest solution right now. However I think it would be interesting to 
change the return value to an error code (this would supply more 
detailed information in case of failure and we could use EEXIST to know 
if the value already existed. In fact I think it would be interesting to 
progressively change the -1 return code of many functions by an error 
code). The pointer to pointer argument could be NULL if the previous 
value is not needed.

Of course this would change the function signature, breaking a lot of 
existing code. Another possibility could be to create a dict_replace() 
function, and possibly make it to fail if the value didn't exist.

>     * Always return the data_pair_t structure instead of data_t or the
>     data itself.
>
>     This can be useful to avoid future lookups or other operations on
>     the same element. Macros can be created to simplify writing code
>     to access the actual value.
>
>
> The use case is not clear. A more concrete example will help..
>
Having a data_pair_t could help to navigate from an existing element 
(getting next or previous. This is really interesting if dict where 
implemented using a sorted structure like a trie since it would allow to 
process a set of similar entries very fast, like the trusted.afr.<brick> 
values for example) or removing or replacing it without needing another 
lookup (a more detailed analysis would be needed to see how to handle 
race conditions).

By the way, is really the dict_t structure used concurrently ? I haven't 
analyzed all the code deeply, but it seems to me that every dict_t is 
only accessed from a single place at once.

>     * Use a trie instead of a hash.
>
>     A trie structure is a bit more complex than a hash, but only
>     processes the key once and does not need to compute the hash. A
>     test implementation I made with a trie shows a significant
>     improvement in dictionary operations.
>
>
> There is already an implementation of trie in libglusterfs/src/trie.c. 
> Though it does not compact (collapse) single-child nodes upwards into 
> the parent. In any case, let's avoid having two implementations of tries.
I know. The current implementation wastes a lot of memory because it 
uses an array of 256 pointers, and in some places it needs to traverse 
the array. Not a b¡g deal, but if it is made many times it could be 
noticeable. In my test I used a trie with 4 child pointers (with 
collapsing single-child nodes) that runs a bit faster than the 256 
implementation and uses much less memory. I tried with 2, 4, 16 and 256 
childs per node, and 4 seems to be the best (at least for dictionary 
structures) though there are very little difference between 4 and 16 in 
terms of speed.

I agree that it is not good to maintain two implementations of the same 
thing. Maybe we could change the trie implementation. It should be 
transparent.

>     * Implement dict_foreach() as a macro (similar to kernel's
>     list_for_each()).
>
>     This gives more control and avoids the need of helper functions.
>
>
> This makes sense too, but there are quite a few users of dict_foreach 
> in the existing style. Moving them all over might be a pain.
Maybe we could create a differently named macro to implement this 
feature and allow the developers to slowly change it. The old 
implementation could be flagged as deprecated and use the new one for 
new code. Old code will have enough time to change it until eventually 
the old implementation is removed.

If we make important changes to the dict_t structure, we could replace 
current functions by macros that use the new implementation but 
simulates the old behavior.

>     Additionally, I think it's possible to redefine structures to
>     reduce the number of allocations and pointers used for each
>     element (actual data, data_t, data_pair_t and key).
>
>
> This is highly desirable. There was some effort from Amar in the past 
> (http://review.gluster.org/3910) but it has been in need of attention 
> for some time. It would be intersting to know if you were thinking 
> along similar lines?
>
Yes, it is quite similar though I should analyze it more deeply. I would 
also try to remove some unused/unneeded fields that are used in very few 
places, add complexity and can be replaced easily, like extra_free and 
extra_stdfree in dict_t for example.

>
> Avati
>
>     What do you think ?
>
>     Best regards,
>
>     Xavi
>
>     _______________________________________________
>     Gluster-devel mailing list
>     Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
>     https://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130903/9c3eaeb8/attachment-0001.html>