[Gluster-devel] Readdir d_off encoding

Thu Jan 29 21:17:43 UTC 2015

On 01/07/2015 04:16 PM, J. Bruce Fields wrote:
> On Mon, Dec 22, 2014 at 02:04:37PM -0500, J. Bruce Fields wrote:
>> It'd also be nice to see any proposals for a completely correct
>> solution, even if it's something that will take a while.  All I can
>> think of is protocol extensions, but that's just what I know.
>
> I tried to think a little about this over the holidays: say we could
> scrap NFS and start from scratch, what would we do?:
>
> - larger NFS readdir cookies: if NFS cookies were 128 bits, then gluster
>    could stick the filesystem's offset in the lower 64 bits and its own
>    data in the upper 64 bits.

Dan was mentioning the other day about _negotiating_ and then setting 
the cookie size, in case this is being done from scratch. Thought this 
is worth mentioning, as it would be a good move.

>
>    This doesn't work if anyone else does this, though: if we change to
>    128 bits here then people may eventually want to do the same thing to
>    filesystem and systemcall interfaces too and then we're back at square
>    one.  If people want to be able to stack arbitrary readdir
>    implementations the we can't really choose a fixed size limit any
>    more.
>
> - stateful readdir: make clients open the directory, read through it
>    from start to finish, then close it.  That's all clients really want
>    to do anyway--they don't need to seek back to offsets returned
>    arbitrarily long ago.  However, they do need to be able to resend the
>    last readdir request in case the reply was lost, and they do need to
>    be able to resume reading a directory after a server reboot.
>
>    So I think that would still leave gluster needing to keep a
>    (persistent, on-disk) cache mapping the NFS cookies it hands out to
>    the offsets in the backend directories.  The difference is just that
>    it would only have to cache the small number of entries that are in
>    use by current readdirs in progress instead of potentially having to
>    keep them all forever.  I don't know, does that help much?
>
> --b.
>