[Gluster-devel] Readdir d_off encoding
Shyam
srangana at redhat.com
Thu Jan 29 21:17:43 UTC 2015
On 01/07/2015 04:16 PM, J. Bruce Fields wrote:
> On Mon, Dec 22, 2014 at 02:04:37PM -0500, J. Bruce Fields wrote:
>> It'd also be nice to see any proposals for a completely correct
>> solution, even if it's something that will take a while. All I can
>> think of is protocol extensions, but that's just what I know.
>
> I tried to think a little about this over the holidays: say we could
> scrap NFS and start from scratch, what would we do?:
>
> - larger NFS readdir cookies: if NFS cookies were 128 bits, then gluster
> could stick the filesystem's offset in the lower 64 bits and its own
> data in the upper 64 bits.
Dan was mentioning the other day about _negotiating_ and then setting
the cookie size, in case this is being done from scratch. Thought this
is worth mentioning, as it would be a good move.
>
> This doesn't work if anyone else does this, though: if we change to
> 128 bits here then people may eventually want to do the same thing to
> filesystem and systemcall interfaces too and then we're back at square
> one. If people want to be able to stack arbitrary readdir
> implementations the we can't really choose a fixed size limit any
> more.
>
> - stateful readdir: make clients open the directory, read through it
> from start to finish, then close it. That's all clients really want
> to do anyway--they don't need to seek back to offsets returned
> arbitrarily long ago. However, they do need to be able to resend the
> last readdir request in case the reply was lost, and they do need to
> be able to resume reading a directory after a server reboot.
>
> So I think that would still leave gluster needing to keep a
> (persistent, on-disk) cache mapping the NFS cookies it hands out to
> the offsets in the backend directories. The difference is just that
> it would only have to cache the small number of entries that are in
> use by current readdirs in progress instead of potentially having to
> keep them all forever. I don't know, does that help much?
>
> --b.
>
More information about the Gluster-devel
mailing list