[Gluster-devel] [RFC ] dictionary optimizations

Wed Sep 4 16:10:58 UTC 2013

On Wed, Sep 4, 2013 at 6:37 AM, Xavier Hernandez <xhernandez at datalab.es>wrote:

> Al 04/09/13 14:05, En/na Jeff Darcy ha escrit:
>
>  On 09/04/2013 04:27 AM, Xavier Hernandez wrote:
>>
>>> I would also like to note that each node can store multiple elements.
>>> Current implementation creates a node for each byte in the key. In my
>>> implementation I only create a node if there is a prefix coincidence
>>> between
>>> 2 or more keys. This reduces the number of nodes and the number of
>>> indirections.
>>>
>>
>> Whatever we do, we should try to make sure that the changes are profiled
>> against real usage.  When I was making my own dict optimizations back in
>> March
>> of last year, I started by looking at how they're actually used. At that
>> time,
>> a significant majority of dictionaries contained just one item. That's
>> why I
>> only implemented a simple mechanism to pre-allocate the first data_pair
>> instead
>> of doing something more ambitious.  Even then, the difference in actual
>> performance or CPU usage was barely measurable.  Dict usage has certainly
>> changed since then, but I think you'd still be hard pressed to find a case
>> where a single dict contains more than a handful of entries, and
>> approaches
>> that are optimized for dozens to hundreds might well perform worse than
>> simple
>> ones (e.g. because of cache aliasing or branch misprediction).
>>
>> If you're looking for other optimization opportunities that might provide
>> even
>> bigger "bang for the buck" then I suggest that stack-frame or frame->local
>> allocations are a good place to start.  Or string copying in places like
>> loc_copy.  Or the entire fd_ctx/inode_ctx subsystem.  Let me know and
>> I'll come
>> up with a few more.  To put a bit of a positive spin on things, the
>> GlusterFS
>> code offers many opportunities for improvement in terms of CPU and memory
>> efficiency (though it's surprisingly still way better than Ceph in that
>> regard).
>>
>>  Yes. The optimizations on dictionary structures are not a big
> improvement in the overall performance of GlusterFS. I tried it on a real
> situation and the benefit was only marginal. However I didn't test new
> features like an atomic lookup and remove if found (because I would have
> had to review all the code). I think this kind of functionalities could
> improve a bit more the results I obtained.
>
> However this is not the only reason to do these changes. While I've been
> writing code I've found that it's tedious to do some things just because
> there isn't such functions in dict_t. Some actions require multiple calls,
> having to check multiple errors and adding complexity and limiting
> readability of the code. Many of these situations could be solved using
> functions similar to what I proposed.
>
> On the other side, if dict_t must be truly considered a concurrent
> structure, there are a lot of race conditions that might appear when doing
> some operations. It would require a great effort to take care of all these
> possibilities everywhere. It would be better to pack most of these
> situations into functions inside the dict_t itself where it is easier to
> combine some operations.
>
> By the way, I've made some tests with multiple bricks and it seems that
> there is a clear speed loss on directory listings as the number of bricks
> increases. Since bricks should be independent and they can work in
> parallel, I didn't expected such a big performance degradation.

The likely reason is that, even though bricks are parallel for IO, readdir
is essentially a sequential operation and DHT has a limitation that a
readdir reply batch does not cross server boundaries. So if you have 10
files and 1 server, all 10 entries are returned in one call to the
app/libc. If you have 10 files and 10 servers evenly distributed, the
app/libc has to perform 10 calls and keeps getting one file at a time. This
problem goes away when each server has enough files to fill up a readdir
batch. It's only when you have too few files and too many servers that this
"dilution" problem shows up. However, this is just a theory and your
problem may be something else too..

Note that Brian Foster's readdir-ahead patch should address this problem to
a large extent. When loaded on top of DHT, the prefiller effectively
collapses the smaller chunks returned by DHT into a larger chunk requested
by the app/libc.

Avati

> However the tests have not been exhaustive nor made in best conditions so
> they might be misleading. Anyway it seems to me that there might be a
> problem with some mutexes that force too much serialization of requests
> (though I have no real proves it's only a feeling). Maybe some more
> "asynchronousity" on calls between translators could help.
>
> Only some thoughts...
>
> Best regards,
>
> Xavi
>
>
>
>> ______________________________**_________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> https://lists.nongnu.org/**mailman/listinfo/gluster-devel<https://lists.nongnu.org/mailman/listinfo/gluster-devel>
>>
>
>
> ______________________________**_________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/**mailman/listinfo/gluster-devel<https://lists.nongnu.org/mailman/listinfo/gluster-devel>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130904/61e6a7ca/attachment-0001.html>