[Gluster-devel] regressions due to 64-bit ext4 directory cookies

Dave Chinner david at fromorbit.com
Fri Feb 15 02:27:38 UTC 2013

On Thu, Feb 14, 2013 at 05:01:10PM -0500, J. Bruce Fields wrote:
> On Thu, Feb 14, 2013 at 05:10:02PM +1100, Dave Chinner wrote:
> > On Wed, Feb 13, 2013 at 05:20:52PM -0500, Theodore Ts'o wrote:
> > > Telldir() and seekdir() are basically implementation horrors for any
> > > file system that is using anything other than a simple array of
> > > directory entries ala the V7 Unix file system or the BSD FFS.  For any
> > > file system which is using a more advanced data structure, like
> > > b-trees hash trees, etc, there **can't** possibly be a "offset" into a
> > > readdir stream. 
> > 
> > I'll just point you to this:
> > 
> > http://marc.info/?l=linux-ext4&m=136081996316453&w=2
> > 
> > so you can see that XFS implements what you say can't possibly be
> > done. ;)
> > 
> > FWIW, that post only talked about the data segment. I didn't mention
> > that XFS has 2 other segments in the directory file (both beyond
> > EOF) for the directory data indexes. One contains the name-hash btree
> > index used for name based lookups and the other contains a freespace
> > index for tracking free space in the data segment.
> OK, so in some sense that reduces the problem to that of implementing
> readdir cookies for directories that are stored in a simple linear
> array.


> Which I should know how to do but I don't: I guess all you need is a
> provision for making holes on remove (so that you aren't required move
> existing entries, messing up offsets for concurrent readers)?


The data segment is a virtual mapping that is maintained by the
extent tree, so we can simply punch holes in it for directory blocks
that are empty and no longer referenced. i.e. the data segement
really is just a sparse file.

The result of doing block mapping this way is that the freespace
tracking segment actually only needs to track space in partially
used blocks. Hence we only need to allocate new blocks when the
freespace map empties, And we work out where to allocate the new
block in the virtual map by doing an extent tree lookup to find the
first hole....

> Purely out of curiosity: is there a more detailed writeup of XFS's
> directory format?  (Or a pointer to a piece of the code a person could
> understand without losing a month to it?)

Not really. There's documentation of the on-disk structures, but
it's a massive leap from there to understanding the structure and
how it all ties together.  I've been spending the past couple of
months deep in the depths of the XFS directory code so how it all
works is front-and-center in my brain right now...

That said, the thought had crossed my mind that there's a a couple
of LWN articles/conference talks I could put together as a brain
dump. ;)


Dave Chinner
david at fromorbit.com

More information about the Gluster-devel mailing list