[Gluster-devel] [RFC ] dictionary optimizations

Mon Sep 2 14:24:46 UTC 2013

Hi,

dict_t structures are widely used in glusterfs. I've some ideas that 
could improve its performance.

* On delete operations, return the current value if it exists.

This is very useful when we want to get a value and remove it from the 
dictionary. This way it can be done accessing and locking the dict_t 
only once (and it is atomic).

* On add operations, return the previous value if it existed.

This avoids to use a lookup and a conditional add (and it is atomic).

* Always return the data_pair_t structure instead of data_t or the data 
itself.

This can be useful to avoid future lookups or other operations on the 
same element. Macros can be created to simplify writing code to access 
the actual value.

* Use a trie instead of a hash.

A trie structure is a bit more complex than a hash, but only processes 
the key once and does not need to compute the hash. A test 
implementation I made with a trie shows a significant improvement in 
dictionary operations.

* Implement dict_foreach() as a macro (similar to kernel's list_for_each()).

This gives more control and avoids the need of helper functions.

Additionally, I think it's possible to redefine structures to reduce the 
number of allocations and pointers used for each element (actual data, 
data_t, data_pair_t and key).

What do you think ?

Best regards,

Xavi