[Gluster-devel] [DHT] serialized readdir(p) across subvols and effect on performance

Mon Dec 31 15:09:06 UTC 2018

All,

As many of us are aware, readdir(p)s are serialized across DHT subvols. One
of the intuitive first reactions for this algorithm is that readdir(p) is
going to be slow.

However this is partly true as reading the contents of a directory is
normally split into multiple readdir(p) calls and most of the times (when a
directory is sufficiently large to have dentries and inode data is bigger
than a typical readdir(p) buffer size - 128K when readdir-ahead is enabled
and 4KB on fuse when readdir-ahead is disabled - on each subvol) a single
readdir(p) request is served from a single subvolume (or two subvolumes in
the worst case) and hence a single readdir(p) is not serialized across all
subvolumes.

Having said that, there are definitely cases where a single readdir(p)
request can be serialized on many subvolumes. A best example for this is a
readdir(p) request on an empty directory. Other relevant examples are those
directories which don't have enough dentries to fit into a single
readdir(p) buffer size on each subvolume of DHT. This is where
performance.parallel-readdir helps. Also, note that this is the same reason
why having cache-size for each readdir-ahead (loaded as a parent for each
DHT subvolume) way bigger than a single readdir(p) buffer size won't really
improve the performance in proportion to cache-size when
performance.parallel-readdir is enabled.

Though this is not a new observation [1] (I stumbled upon [1] after
realizing the above myself independently while working on
performance.parallel-readdir), I felt this as a common misconception (I ran
into similar argument while trying to explain DHT architecture to someone
new to Glusterfs recently) and hence thought of writing out a mail to
clarify the same.

Wish you a happy new year 2019 :).

[1] https://lists.gnu.org/archive/html/gluster-devel/2013-09/msg00034.html

regards,
Raghavendra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20181231/08316f49/attachment.html>