[Gluster-users] [Gluster-devel] A question of GlusterFS dentries!

Raghavendra G raghavendra at gluster.com
Thu Nov 3 07:25:17 UTC 2016


On Thu, Nov 3, 2016 at 11:34 AM, Keiviw <keiviw at 163.com> wrote:

> If GlusterFS does not support POSIX seekdir,what problems will user or
> GlusterFS have?
>

Glusterfs won't have any problem if we don't support seekdir. I am also not
sure whether applications have real use-case for seekdir. But, however its
a POSIX requirement.



>
> 发自网易邮箱大师
> On 11/03/2016 12:52, Raghavendra G <raghavendra at gluster.com> wrote:
>
>
>
> On Wed, Nov 2, 2016 at 9:38 AM, Raghavendra Gowdappa <rgowdapp at redhat.com>
> wrote:
>
>>
>>
>> ----- Original Message -----
>> > From: "Keiviw" <keiviw at 163.com>
>> > To: gluster-devel at gluster.org
>> > Sent: Tuesday, November 1, 2016 12:41:02 PM
>> > Subject: [Gluster-devel] A question of GlusterFS dentries!
>> >
>> > Hi,
>> > In GlusterFS distributed volumes, listing a non-empty directory was
>> slow.
>> > Then I read the dht codes and found the reasons. But I was confused that
>> > GlusterFS dht travesed all the bricks(in the volume) sequentially,why
>> not
>> > use multi-thread to read dentries from multiple bricks simultaneously.
>> > That's a question that's always puzzled me, Couly you please tell me
>> > something about this???
>>
>> readdir across subvols is sequential mostly because we have to support
>> rewinddir(3).
>
>
> Sorry. seekdir(3) is the more relevant function here. Since rewinddir
> resets the dir stream to beginning, its not much of a difficulty to support
> rewinddir with parallel readdirs across subvols.
>
>
>> We need to maintain the mapping of offset and dentry across multiple
>> invocations of readdir. In other words if someone did a rewinddir to an
>> offset corresponding to earlier dentry, subsequent readdirs should return
>> same set of dentries what the earlier invocation of readdir returned. For
>> example, in an hypothetical scenario, readdir returned following dentries:
>>
>> 1. a, off=10
>> 2. b, off=2
>> 3. c, off=5
>> 4. d, off=15
>> 5. e, off=17
>> 6. f, off=13
>>
>> Now if we did rewinddir to off 5 and issue readdir again we should get
>> following dentries:
>> (c, off=5), (d, off=15), (e, off=17), (f, off=13)
>>
>> Within a subvol backend filesystem provides rewinddir guarantee for the
>> dentries present on that subvol. However, across subvols it is the
>> responsibility of DHT to provide the above guarantee. Which means we
>> should've some well defined order in which we send readdir calls (Note that
>> order is not well defined if we do a parallel readdir across all subvols).
>> So, DHT has sequential readdir which is a well defined order of reading
>> dentries.
>>
>> To give an example if we have another subvol - subvol2 - (in addiction to
>> the subvol above - say subvol1) with following listing:
>> 1. g, off=16
>> 2. h, off=20
>> 3. i, off=3
>> 4. j, off=19
>>
>> With parallel readdir we can have many ordering like - (a, b, g, h, i, c,
>> d, e, f, j), (g, h, a, b, c, i, j, d, e, f) etc. Now if we do (with readdir
>> done parallely):
>>
>> 1. A complete listing of the directory (which can be any one of 10P1 = 10
>> ways - I hope math is correct here).
>> 2. Do rewinddir (20)
>>
>> We cannot predict what are the set of dentries that come _after_ offset
>> 20. However, if we do a readdir sequentially across subvols there is only
>> one directory listing i.e, (a, b, c, d, e, f, g, h, i, j). So, its easier
>> to support rewinddir.
>>
>> If there is no POSIX requirement for rewinddir support, I think a
>> parallel readdir can easily be implemented (which improves performance
>> too). But unfortunately rewinddir is still a POSIX requirement. This also
>> opens up another possibility of a "no-rewinddir-support" option in DHT,
>> which if enabled results in parallel readdirs across subvols. What I am not
>> sure is how many users still use rewinddir? If there is a critical mass
>> which wants performance with a tradeoff of no rewinddir support this can be
>> a good feature.
>>
>> +gluster-users to get an opinion on this.
>>
>> regards,
>> Raghavendra
>>
>> >
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Gluster-devel mailing list
>> > Gluster-devel at gluster.org
>> > http://www.gluster.org/mailman/listinfo/gluster-devel
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> --
> Raghavendra G
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Raghavendra G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161103/5991f338/attachment.html>


More information about the Gluster-users mailing list