[Gluster-devel] Progress on adding support for SEEK_DATA and SEEK_HOLE

Niels de Vos ndevos at redhat.com
Sun Jul 5 23:15:17 UTC 2015


On Wed, Jul 01, 2015 at 09:41:19PM +0200, Niels de Vos wrote:
> On Wed, Jul 01, 2015 at 07:15:12PM +0200, Xavier Hernandez wrote:
> > On 07/01/2015 08:53 AM, Niels de Vos wrote:
> > >On Tue, Jun 30, 2015 at 11:48:20PM +0530, Ravishankar N wrote:
> > >>
> > >>
> > >>On 06/22/2015 03:22 PM, Ravishankar N wrote:
> > >>>
> > >>>
> > >>>On 06/22/2015 01:41 PM, Miklos Szeredi wrote:
> > >>>>On Sun, Jun 21, 2015 at 6:20 PM, Niels de Vos <ndevos at redhat.com> wrote:
> > >>>>>Hi,
> > >>>>>
> > >>>>>it seems that there could be a reasonable benefit for virtual machine
> > >>>>>images on a FUSE mountpoint when SEEK_DATA and SEEK_HOLE would be
> > >>>>>available. At the moment, FUSE does not pass lseek() on to the
> > >>>>>userspace
> > >>>>>process that handles the I/O.
> > >>>>>
> > >>>>>Other filesystems that do not (need to) track the position in the
> > >>>>>file-descriptor are starting to support SEEK_DATA/HOLE. One example is
> > >>>>>NFS:
> > >>>>>
> > >>>>>https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-38#section-15.11
> > >>>>>
> > >>>>>I would like to add this feature to Gluster, and am wondering if there
> > >>>>>are any reasons why it should/could not be added to FUSE.
> > >>>>I don't see any reason why it couldn't be added.  Please go ahead.
> > >>>
> > >>>Thanks for bouncing the mail to me Niels, I would be happy to work on
> > >>>this. I'll submit a patch by Monday next.
> > >>>
> > >>
> > >>
> > >>Sent a patch @
> > >>http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/14752
> > >>I've tested it with some skeleton code in gluster-fuse to handle lseek().
> > >
> > >Ravi also sent his patch for glusterfs-fuse:
> > >
> > >   http://review.gluster.org/11474
> > >
> > >I have posted my COMPLETELY UNTESTED patches to their own Gerrit topic
> > >so that we can easily track the progress:
> > >
> > >   http://review.gluster.org/#/q/status:open+project:glusterfs+branch:master+topic:wip/SEEK_HOLE
> > >
> > >My preference goes to share things early and make everyone able to
> > >follow progress (know where to find the latest patches). Assistance in
> > >testing, reviewing and improving is welcome! There are some outstanding
> > >things like seek() for ec and sharding, and probably more.
> > >
> > >This all was done as a suggestion from Christopher (kripper) Pereira,
> > >for improving the handling of sparse files (like most VM images).
> > 
> > I've posted the patch for ec in the same Gerrit topic:
> > 
> >     http://review.gluster.org/11494/
> 
> Thanks!
> 
> > It has not been tested and some discussion about if it's really needed to
> > send the request to all subvolumes will be needed.
> > 
> > The lock and the xattrop are absolutely needed. Even if we send the request
> > to only one subvolume, we need to know which ones are healthy (to avoid
> > sending the request to a brick that could have invalid hole information).
> > This could have been done in open, but since NFS does not issue open calls,
> > we cannot rely on that.
> 
> Ok, yes, that makes sense. We will likely have SEEK as an operation in
> NFS-Ganesha at one point, and that will use the handle-based gfapi
> functions.
> 
> > Once we know which bricks are healthy we could opt for sending the request
> > only to one of them. In this case we need to be aware that even healthy
> > bricks could have different hole locations.
> 
> I'm not sure if I understand what you mean, but that likely has to do
> that I dont know much about ec. I'll try to think it through later this
> week.

The only thing that would need to be guaranteed is that the offset of
the hole/data is safe. The whole purpose is to improve handling of
sparse files, this does not need to be perfect. The holes themselves are
not important, but the non-holes are.

When a sparse file (think VM image) is copied, the goal is to not read
the holes which would return NUL bytes. If calculating the start of a
hole or the end is not exact, that is not a fatal issue. Reading and
backing up a series of NUL bytes before/after the hole should be
acceptable.

A drawing can probably explain things a little better.


                        lseek(SEEK_HOLE)
                          |       |
                  perfect |       | acceptable
                    match |       | match
                          |       |
     .....................|.......|.....................
     :file                |       |                    :
     : .----------------. v       v           .------. :
     : | DATA DATA DATA | NUL NUL NUL NUL NUL | DATA | :
     : '----------------'                 ^   '------' :
     :                                    |   ^        :
     .....................................|...|.........
                                          |   |
                               acceptable |   | perfect
                                    match |   | match
                                          |   |
                                        lseek(SEEK_DATA)


I have no idea how ec can figure out the offset of holes/data, that
would be interesting to know. Is it something that is available in a
design document somewhere?

My inclination is to have the same consistency for the seek() FOP as for
read(). The same locking and health-checks would apply. Does that help?

Thanks,
Niels


More information about the Gluster-devel mailing list