[Gluster-devel] [RFC ] dictionary optimizations
avati at gluster.org
Wed Sep 4 16:10:58 UTC 2013
On Wed, Sep 4, 2013 at 6:37 AM, Xavier Hernandez <xhernandez at datalab.es>wrote:
> Al 04/09/13 14:05, En/na Jeff Darcy ha escrit:
> On 09/04/2013 04:27 AM, Xavier Hernandez wrote:
>>> I would also like to note that each node can store multiple elements.
>>> Current implementation creates a node for each byte in the key. In my
>>> implementation I only create a node if there is a prefix coincidence
>>> 2 or more keys. This reduces the number of nodes and the number of
>> Whatever we do, we should try to make sure that the changes are profiled
>> against real usage. When I was making my own dict optimizations back in
>> of last year, I started by looking at how they're actually used. At that
>> a significant majority of dictionaries contained just one item. That's
>> why I
>> only implemented a simple mechanism to pre-allocate the first data_pair
>> of doing something more ambitious. Even then, the difference in actual
>> performance or CPU usage was barely measurable. Dict usage has certainly
>> changed since then, but I think you'd still be hard pressed to find a case
>> where a single dict contains more than a handful of entries, and
>> that are optimized for dozens to hundreds might well perform worse than
>> ones (e.g. because of cache aliasing or branch misprediction).
>> If you're looking for other optimization opportunities that might provide
>> bigger "bang for the buck" then I suggest that stack-frame or frame->local
>> allocations are a good place to start. Or string copying in places like
>> loc_copy. Or the entire fd_ctx/inode_ctx subsystem. Let me know and
>> I'll come
>> up with a few more. To put a bit of a positive spin on things, the
>> code offers many opportunities for improvement in terms of CPU and memory
>> efficiency (though it's surprisingly still way better than Ceph in that
>> Yes. The optimizations on dictionary structures are not a big
> improvement in the overall performance of GlusterFS. I tried it on a real
> situation and the benefit was only marginal. However I didn't test new
> features like an atomic lookup and remove if found (because I would have
> had to review all the code). I think this kind of functionalities could
> improve a bit more the results I obtained.
> However this is not the only reason to do these changes. While I've been
> writing code I've found that it's tedious to do some things just because
> there isn't such functions in dict_t. Some actions require multiple calls,
> having to check multiple errors and adding complexity and limiting
> readability of the code. Many of these situations could be solved using
> functions similar to what I proposed.
> On the other side, if dict_t must be truly considered a concurrent
> structure, there are a lot of race conditions that might appear when doing
> some operations. It would require a great effort to take care of all these
> possibilities everywhere. It would be better to pack most of these
> situations into functions inside the dict_t itself where it is easier to
> combine some operations.
> By the way, I've made some tests with multiple bricks and it seems that
> there is a clear speed loss on directory listings as the number of bricks
> increases. Since bricks should be independent and they can work in
> parallel, I didn't expected such a big performance degradation.
The likely reason is that, even though bricks are parallel for IO, readdir
is essentially a sequential operation and DHT has a limitation that a
readdir reply batch does not cross server boundaries. So if you have 10
files and 1 server, all 10 entries are returned in one call to the
app/libc. If you have 10 files and 10 servers evenly distributed, the
app/libc has to perform 10 calls and keeps getting one file at a time. This
problem goes away when each server has enough files to fill up a readdir
batch. It's only when you have too few files and too many servers that this
"dilution" problem shows up. However, this is just a theory and your
problem may be something else too..
Note that Brian Foster's readdir-ahead patch should address this problem to
a large extent. When loaded on top of DHT, the prefiller effectively
collapses the smaller chunks returned by DHT into a larger chunk requested
by the app/libc.
> However the tests have not been exhaustive nor made in best conditions so
> they might be misleading. Anyway it seems to me that there might be a
> problem with some mutexes that force too much serialization of requests
> (though I have no real proves it's only a feeling). Maybe some more
> "asynchronousity" on calls between translators could help.
> Only some thoughts...
> Best regards,
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gluster-devel