[Bugs] [Bug 1543072] Listing a directory that contains many directories takes to much time

Fri Feb 9 07:56:27 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1543072

Raghavendra G <rgowdapp at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rgowdapp at redhat.com
          Component|core                        |distribute

--- Comment #1 from Raghavendra G <rgowdapp at redhat.com> ---
>From the email Marian had sent, the configuration is a single brick DHT. The
problem was first observed in a setup of 192x2 distribute-replicate, but can be
seen on a DHT with single subvolume too.

I suspect handling of dentries pointing to directories in dht_readdirp_cbk to
be the cause for this bug. Note that usually ls is aliased with some options
and listing is usually accompanied by stat on the dentries.

I think following are the reasons for increased latency:

* readdirplus won't provide nodeid/gfid and stat information for dentries
pointing to a directory. So, if application is interested in stat/inode
information (as most likely in this case), it has to do a lookup.
* In configurations where DHT has large number of subvolumes, the latency of
this lookup is the latency of slowest lookup to each of the subvolumes. Note
that latency is NOT cumulative latency of all lookup latencies but is equal to
the LARGEST latency.

We had discussed this problem with performance engineering group in Redhat. In
a realistic workload this problem might not be too common as this problem
occurs only in datasets with directories constitute significantly larger
percentage of data. Nevertheless, we would be interested in knowing of real
world use-cases where users have large number of directories.

Note that all this I've said above is just a hypothesis and we need to do some
experiments to confirm the hypothesis. Some of the experiments I can think of
are:
* Making sure that ls just issues readdir without requiring stat. This can be
done by unaliasing ls before issuing the cmd, so that any options are stripped
out and issuing plain ls without any options. If directories and files consume
same time for listing in this test, we can be sure that the extra latencies
seen for directories is due to lookup/stat.
* Disabling readdirplus in the entire stack [1] and running ls. Note that it is
not necessary to unalias ls in this test as there'll be a lookup for files too
(since readdirplus is disabled for them too). Note that this test might yield
slightly different results for large volumes due to the fact that lookup
latency will be different for directory and files. But, in a single subvolume
test they should be same.

Based on the above reasoning, changing the component to distribute. But, can be
changed if experiments show different results.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-March/030148.html

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.