[Gluster-devel] Finalizing interfaces for snapshot and clone creation in BD xlator

M. Mohan Kumar mohankumar.m at gmail.com
Fri Sep 27 17:29:16 UTC 2013


On Fri, Sep 27, 2013 at 7:47 PM, Brian Foster <bfoster at redhat.com> wrote:

> On 09/26/2013 01:27 PM, M. Mohan Kumar wrote:
> > In BD xlator cloning refers to full copy of the file (after the copy
> > there is no relationship between these 2 files). Snapshot refers to COW
> > of the file. I guess these terminologies need to be generalized. I can
> > choose "copy" for full copy functionality and either "clone" or
> > "snapshot" for COW functionality.
> >
>
> That sounds reasonable to me. I'm wondering if there are ways to
> multiplex the commands in a more general way (i.e., clone means one
> thing on a bd device, another on a regular file), but for now it might
> be better to just get all of the pieces in place.
>

Ok, qcow2 patches uses  trusted.glusterfs.block-format xattr to create a
clone/snapshot. IIUC normal users can't operate on trusted namespace
xattrs, it needs a change.


> > Sorry, I could not go through complete qemu-block xlator code. If I
> > understand correctly your patch
> > (http://review.gluster.com/#/c/5967/) also expects destination file to
> > exist before cloning? ie in 'setxattr -n trusted.glusterfs.block-format
> > -v "qcow2:10GB:<bimg>" ./newimage',  the file newimage exists before
> > calling this operation, isn't it?
> >
>
> Yes, that's correct. BTW, my understanding of the bd case is that the
> destination file has to exist in the same manner, correct?
>

Correct

>
> > BD patches allow users to specify full path of the destination file as
> > part of xattr value because thats how management tools pass the path for
> > creating snapshot or complete copy of a KVM image. Current interface for
> > offloaded functionality for BD: # setfattr -n [clone|snapshot] -v
> > "/path/to/destination-file" path-to-source-file. So IIUC in both the
> > approaches we need to validate both source and destination files or am I
> > missing something here?
> >
>
> Yeah, I'm not disputing that the resolving code is necessary. I'm
> suggesting it could be independently useful to bd snapshots or
> qemu-block (file) snapshots. Why make each snapshot capable translator
> mechanism solve the same problem?
>
> My thought was generally:
>
> - "snapman" xlator sits high up in the graph and implements the setxattr
> interface
> - independent lower level cow/snap mechanism translators work on a lower
> level (possibly common) interface, such as ioctl
>
> Of course, ioctl support doesn't exist at the moment and it's not
> necessarily given that it's the right approach (perhaps a fop, or
> something else?). So perhaps the right thing for now is to emulate this
> kind of thing with another internal setxattr that encapsulates all the
> work the interface translator has done (i.e., pass down the gfid's
> instead of paths, etc.).
>
>
Looks like long way to go, may be I will post my current set of patches to
Gerrit
for review. Offload part can wait for these standardization process.


> The longer term idea is that an application can choose to use that lower
> level interface. Think 'cp --reflink,' for example. In that case, an
> ioctl is a requirement, but the broader point is that by virtue of the
> interface the application has already been required to resolve the paths
> (to open the files) and the higher level interface is not necessary.
> Thoughts?
>

Ideally  cp --reflink should be used for creating COW files(I even
documented
this in my bd_map xlator presentation). I guess cp --reflink needs a patch
to
work with FUSE (and GlusterFS). Also glusterfs needs new reflink FOP. But
upstream effort for making reflink syscall still pending.

Also there is recent effort to provide copy functionality through splice
interface,
once it stabilizes FUSE/GlusterFS need to support this FOP also. Copy
functionality
is commonly used in VM environment and this offloaded copy will be really
useful
in this situation.


> > Also how about listing snapshots/origin for a given file and ability to
> > merge the external snapshot file to the origin? BD xlator has the
> > capability to list origin (ie gfid) for a given snapshot, but listing
> > snapshots for a given file is not implemented.
> >
>
> I started thinking about this a bit for the qemu-block case, but tbh I'm
> not sure how generic we can make this. Perhaps the interface translator
> can be responsible for this just using a generic set of xattrs?
>
> IMO, I don't think everything has to be completely worked out and
> genericized up front. I just think if we move in that general direction
> as far as how the code is broken up, it will be easier to line things up
> cleanly as they come in (as long as there is an understanding that
> interfaces might change and whatnot).
>
>
I have merge functionality in BD xlator and also listing origin for a given
snapshot, but nothing 'standardized'.


>
> > Anand, shall I post the patches to gerrit, may be next round we can
> > decide about the interfaces for offload operations?
> >
> >
> >
> > On Wed, Sep 25, 2013 at 6:20 PM, Brian Foster <bfoster at redhat.com
> > <mailto:bfoster at redhat.com>> wrote:
> >
> >     On 09/25/2013 12:27 AM, M. Mohan Kumar wrote:
> >     > Here is the right link: http://review.gluster.org/#/c/5626/
> >     >
> >
> >     Thanks guys. I haven't taken a deep look at the code, but some
> initial
> >     high-level comments...
> >
> >     The first thing I notice is that we take the opposite approach in the
> >     associated qemu-block command. The target of the clone command is the
> >     new file (referencing the source) rather than the original file
> passing
> >     in a name of the target. Personally, I find the former more natural
> as a
> >     core interface. The error handling is more straightforward (i.e.,
> >     ENOENT) and it matches more closely with native primitives that
> provide
> >     this kind of functionality (i.e., correct me if wrong, but I think we
> >     observed that btrfs clone works via ioctl on the target fd, providing
> >     the source fd as a parameter).
> >
> >     That said, I'm not sure if that is considered more user-friendly or
> not.
> >     If that's a concern, could we change the low level interface to work
> as
> >     described (i.e., user issues command on source file, high level code
> >     converts into command on target file)? IOW, I think a nice goal going
> >     forward would be to have the low level mechanisms standardize on some
> >     kind of ioctl, and the higher level code become convenience commands
> >     that simply exercise the ioctl (and what actually happens after that
> >     depends on the type of file, what translators are loaded, etc.). I
> guess
> >     that's hand wavy at the moment, but the idea is that all of this path
> >     resolving and whatnot becomes generic and independent rather than
> >     specific to and duplicated across each snapshot/clone mechanism we
> >     provide.
> >
> >     Secondarily, but somewhat related... does the path resolving code
> that
> >     is there now have to be buried in fuse-bridge? Avati and I have
> briefly
> >     discussed this idea of separating the management here into an
> >     independent translator, and I think this falls in as a perfect
> candidate
> >     for something like that. The resolving code is non-trivial, however,
> so
> >     I'm not sure if there are serious technical hurdles for that kind of
> >     approach. For example, is it possible/reasonable to push this into a
> new
> >     translator beneath fuse (or perhaps library code?) and just skip
> linking
> >     the inode into the parent table until/unless that happens naturally?
> >     Thoughts?
> >
> >     Brian
> >
> >     >
> >     > On Wed, Sep 25, 2013 at 6:53 AM, M. Mohan Kumar
> >     <mohankumar.m at gmail.com <mailto:mohankumar.m at gmail.com>>wrote:
> >     >
> >     >>
> >     >>
> >     >>
> >
> http://review.gluster.org/#/q/owner:%22M.+Mohan+Kumar+%253Cmohan%2540in.ibm.com%253E%22,n,z
> >     >>
> >     >> I also replied to your other comments.
> >     >>
> >     >>
> >     >>
> >     >>
> >     >>
> >     >> On Wednesday, September 25, 2013, Anand Avati <avati at gluster.org
> >     <mailto:avati at gluster.org>> wrote:
> >     >>> Adding Brian Foster (and gluster-devel) for the discussion of
> >     unified UI
> >     >> for snapshotting.
> >     >>> Mohan, I must have missed your comment. Can you please point to
> the
> >     >> specific patch where you posted your comment?
> >     >>> Avati
> >     >>>
> >     >>> On Tue, Sep 24, 2013 at 9:29 AM, M. Mohan Kumar
> >     <mohankumar.m at gmail.com <mailto:mohankumar.m at gmail.com>>
> >     >> wrote:
> >     >>>>
> >     >>>> Hi Avati,
> >     >>>> I am ready with V5 of BD xlator patches (I consolidated the
> >     patches to
> >     >> 5). Before posting them I wanted your opinion about the
> >     interfaces I use
> >     >> for creating clone and snapshot. I posted them on Gerrit few days
> >     back.
> >     >> Could you please respond to that?
> >     >>>>
> >     >>>> --
> >     >>>> Regards,
> >     >>>> Mohan.
> >     >>>
> >     >>
> >     >> --
> >     >> Regards,
> >     >> Mohan.
> >     >>
> >     >
> >     >
> >     >
> >
> >
> >
> >
> > --
> > Regards,
> > Mohan.
>
>


-- 
Regards,
Mohan.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130927/b3a45084/attachment-0001.html>


More information about the Gluster-devel mailing list