[Gluster-devel] Finalizing interfaces for snapshot and clone creation in BD xlator

Fri Sep 27 14:17:26 UTC 2013

On 09/26/2013 01:27 PM, M. Mohan Kumar wrote:
> In BD xlator cloning refers to full copy of the file (after the copy
> there is no relationship between these 2 files). Snapshot refers to COW
> of the file. I guess these terminologies need to be generalized. I can
> choose "copy" for full copy functionality and either "clone" or
> "snapshot" for COW functionality.
> 

That sounds reasonable to me. I'm wondering if there are ways to
multiplex the commands in a more general way (i.e., clone means one
thing on a bd device, another on a regular file), but for now it might
be better to just get all of the pieces in place.

> Sorry, I could not go through complete qemu-block xlator code. If I
> understand correctly your patch
> (http://review.gluster.com/#/c/5967/) also expects destination file to
> exist before cloning? ie in 'setxattr -n trusted.glusterfs.block-format
> -v "qcow2:10GB:<bimg>" ./newimage',  the file newimage exists before
> calling this operation, isn't it?
> 

Yes, that's correct. BTW, my understanding of the bd case is that the
destination file has to exist in the same manner, correct?

> BD patches allow users to specify full path of the destination file as
> part of xattr value because thats how management tools pass the path for
> creating snapshot or complete copy of a KVM image. Current interface for
> offloaded functionality for BD: # setfattr -n [clone|snapshot] -v
> "/path/to/destination-file" path-to-source-file. So IIUC in both the
> approaches we need to validate both source and destination files or am I
> missing something here?
> 

Yeah, I'm not disputing that the resolving code is necessary. I'm
suggesting it could be independently useful to bd snapshots or
qemu-block (file) snapshots. Why make each snapshot capable translator
mechanism solve the same problem?

My thought was generally:

- "snapman" xlator sits high up in the graph and implements the setxattr
interface
- independent lower level cow/snap mechanism translators work on a lower
level (possibly common) interface, such as ioctl

Of course, ioctl support doesn't exist at the moment and it's not
necessarily given that it's the right approach (perhaps a fop, or
something else?). So perhaps the right thing for now is to emulate this
kind of thing with another internal setxattr that encapsulates all the
work the interface translator has done (i.e., pass down the gfid's
instead of paths, etc.).

The longer term idea is that an application can choose to use that lower
level interface. Think 'cp --reflink,' for example. In that case, an
ioctl is a requirement, but the broader point is that by virtue of the
interface the application has already been required to resolve the paths
(to open the files) and the higher level interface is not necessary.
Thoughts?

> Also how about listing snapshots/origin for a given file and ability to
> merge the external snapshot file to the origin? BD xlator has the
> capability to list origin (ie gfid) for a given snapshot, but listing
> snapshots for a given file is not implemented.
> 

I started thinking about this a bit for the qemu-block case, but tbh I'm
not sure how generic we can make this. Perhaps the interface translator
can be responsible for this just using a generic set of xattrs?

IMO, I don't think everything has to be completely worked out and
genericized up front. I just think if we move in that general direction
as far as how the code is broken up, it will be easier to line things up
cleanly as they come in (as long as there is an understanding that
interfaces might change and whatnot).

Brian

> Anand, shall I post the patches to gerrit, may be next round we can
> decide about the interfaces for offload operations?
> 
> 
> 
> On Wed, Sep 25, 2013 at 6:20 PM, Brian Foster <bfoster at redhat.com
> <mailto:bfoster at redhat.com>> wrote:
> 
>     On 09/25/2013 12:27 AM, M. Mohan Kumar wrote:
>     > Here is the right link: http://review.gluster.org/#/c/5626/
>     >
> 
>     Thanks guys. I haven't taken a deep look at the code, but some initial
>     high-level comments...
> 
>     The first thing I notice is that we take the opposite approach in the
>     associated qemu-block command. The target of the clone command is the
>     new file (referencing the source) rather than the original file passing
>     in a name of the target. Personally, I find the former more natural as a
>     core interface. The error handling is more straightforward (i.e.,
>     ENOENT) and it matches more closely with native primitives that provide
>     this kind of functionality (i.e., correct me if wrong, but I think we
>     observed that btrfs clone works via ioctl on the target fd, providing
>     the source fd as a parameter).
> 
>     That said, I'm not sure if that is considered more user-friendly or not.
>     If that's a concern, could we change the low level interface to work as
>     described (i.e., user issues command on source file, high level code
>     converts into command on target file)? IOW, I think a nice goal going
>     forward would be to have the low level mechanisms standardize on some
>     kind of ioctl, and the higher level code become convenience commands
>     that simply exercise the ioctl (and what actually happens after that
>     depends on the type of file, what translators are loaded, etc.). I guess
>     that's hand wavy at the moment, but the idea is that all of this path
>     resolving and whatnot becomes generic and independent rather than
>     specific to and duplicated across each snapshot/clone mechanism we
>     provide.
> 
>     Secondarily, but somewhat related... does the path resolving code that
>     is there now have to be buried in fuse-bridge? Avati and I have briefly
>     discussed this idea of separating the management here into an
>     independent translator, and I think this falls in as a perfect candidate
>     for something like that. The resolving code is non-trivial, however, so
>     I'm not sure if there are serious technical hurdles for that kind of
>     approach. For example, is it possible/reasonable to push this into a new
>     translator beneath fuse (or perhaps library code?) and just skip linking
>     the inode into the parent table until/unless that happens naturally?
>     Thoughts?
> 
>     Brian
> 
>     >
>     > On Wed, Sep 25, 2013 at 6:53 AM, M. Mohan Kumar
>     <mohankumar.m at gmail.com <mailto:mohankumar.m at gmail.com>>wrote:
>     >
>     >>
>     >>
>     >>
>     http://review.gluster.org/#/q/owner:%22M.+Mohan+Kumar+%253Cmohan%2540in.ibm.com%253E%22,n,z
>     >>
>     >> I also replied to your other comments.
>     >>
>     >>
>     >>
>     >>
>     >>
>     >> On Wednesday, September 25, 2013, Anand Avati <avati at gluster.org
>     <mailto:avati at gluster.org>> wrote:
>     >>> Adding Brian Foster (and gluster-devel) for the discussion of
>     unified UI
>     >> for snapshotting.
>     >>> Mohan, I must have missed your comment. Can you please point to the
>     >> specific patch where you posted your comment?
>     >>> Avati
>     >>>
>     >>> On Tue, Sep 24, 2013 at 9:29 AM, M. Mohan Kumar
>     <mohankumar.m at gmail.com <mailto:mohankumar.m at gmail.com>>
>     >> wrote:
>     >>>>
>     >>>> Hi Avati,
>     >>>> I am ready with V5 of BD xlator patches (I consolidated the
>     patches to
>     >> 5). Before posting them I wanted your opinion about the
>     interfaces I use
>     >> for creating clone and snapshot. I posted them on Gerrit few days
>     back.
>     >> Could you please respond to that?
>     >>>>
>     >>>> --
>     >>>> Regards,
>     >>>> Mohan.
>     >>>
>     >>
>     >> --
>     >> Regards,
>     >> Mohan.
>     >>
>     >
>     >
>     >
> 
> 
> 
> 
> -- 
> Regards,
> Mohan.