[Gluster-devel] [Gluster-users] A question of GlusterFS dentries!
raghavendra at gluster.com
Mon Nov 7 10:19:11 UTC 2016
On Wed, Nov 2, 2016 at 9:54 AM, Serkan Çoban <cobanserkan at gmail.com> wrote:
> +1 for "no-rewinddir-support" option in DHT.
> We are seeing very slow directory listing specially with 1500+ brick
> volume, 'ls' takes 20+ second with 1000+ files.
If its not clear, I would like to point out that serialized readdir is not
the sole issue that's causing slowness. If directories are _HUGE_ then I
don't expect too much of benefit from parallelizing. Also, as others have
been pointing out (in various in-person discussions) there are other
scalability limits like number of messages, memory consumed etc to wind
calls parallely. I'll probably do a rough POC in next couple of months to
see whether this idea has any substance or not and post the results.
> On Wed, Nov 2, 2016 at 7:08 AM, Raghavendra Gowdappa
> <rgowdapp at redhat.com> wrote:
> > ----- Original Message -----
> >> From: "Keiviw" <keiviw at 163.com>
> >> To: gluster-devel at gluster.org
> >> Sent: Tuesday, November 1, 2016 12:41:02 PM
> >> Subject: [Gluster-devel] A question of GlusterFS dentries!
> >> Hi,
> >> In GlusterFS distributed volumes, listing a non-empty directory was
> >> Then I read the dht codes and found the reasons. But I was confused that
> >> GlusterFS dht travesed all the bricks(in the volume) sequentially,why
> >> use multi-thread to read dentries from multiple bricks simultaneously.
> >> That's a question that's always puzzled me, Couly you please tell me
> >> something about this???
> > readdir across subvols is sequential mostly because we have to support
> rewinddir(3). We need to maintain the mapping of offset and dentry across
> multiple invocations of readdir. In other words if someone did a rewinddir
> to an offset corresponding to earlier dentry, subsequent readdirs should
> return same set of dentries what the earlier invocation of readdir
> returned. For example, in an hypothetical scenario, readdir returned
> following dentries:
> > 1. a, off=10
> > 2. b, off=2
> > 3. c, off=5
> > 4. d, off=15
> > 5. e, off=17
> > 6. f, off=13
> > Now if we did rewinddir to off 5 and issue readdir again we should get
> following dentries:
> > (c, off=5), (d, off=15), (e, off=17), (f, off=13)
> > Within a subvol backend filesystem provides rewinddir guarantee for the
> dentries present on that subvol. However, across subvols it is the
> responsibility of DHT to provide the above guarantee. Which means we
> should've some well defined order in which we send readdir calls (Note that
> order is not well defined if we do a parallel readdir across all subvols).
> So, DHT has sequential readdir which is a well defined order of reading
> > To give an example if we have another subvol - subvol2 - (in addiction
> to the subvol above - say subvol1) with following listing:
> > 1. g, off=16
> > 2. h, off=20
> > 3. i, off=3
> > 4. j, off=19
> > With parallel readdir we can have many ordering like - (a, b, g, h, i,
> c, d, e, f, j), (g, h, a, b, c, i, j, d, e, f) etc. Now if we do (with
> readdir done parallely):
> > 1. A complete listing of the directory (which can be any one of 10P1 =
> 10 ways - I hope math is correct here).
> > 2. Do rewinddir (20)
> > We cannot predict what are the set of dentries that come _after_ offset
> 20. However, if we do a readdir sequentially across subvols there is only
> one directory listing i.e, (a, b, c, d, e, f, g, h, i, j). So, its easier
> to support rewinddir.
> > If there is no POSIX requirement for rewinddir support, I think a
> parallel readdir can easily be implemented (which improves performance
> too). But unfortunately rewinddir is still a POSIX requirement. This also
> opens up another possibility of a "no-rewinddir-support" option in DHT,
> which if enabled results in parallel readdirs across subvols. What I am not
> sure is how many users still use rewinddir? If there is a critical mass
> which wants performance with a tradeoff of no rewinddir support this can be
> a good feature.
> > +gluster-users to get an opinion on this.
> > regards,
> > Raghavendra
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> Gluster-devel at gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-devel
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> Gluster-devel mailing list
> Gluster-devel at gluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gluster-devel