[Gluster-devel] Standardizing interfaces for BD xlator

M. Mohan Kumar mohan at in.ibm.com
Tue Dec 10 07:48:04 UTC 2013


BD xlator provides certain features such as server offloaded copy,
snapshot etc. But there is no standard way of invoking these operations
due to the limitation in fops and system call interfaces. One has to
issue setxattr interface to achieve these offload operations. Using
setxattr interface in GlusterFS for all non standard operations becomes
ugly and complicated. We are looking for adding new FOPs to cover
these operations.

glfs interfaces for BD xlator:
We are looking for adding interfaces to libgfapi to facilitate consuming
BD xlator features seamlessly. As of now one has to create a posix file
and then issue setxattr/fsetxattr call to create a LV and map that LV to
the posix file. For offload operations they have to get the gfid of the
destination file and pass that gfid in {f}setxattr interface.

Typical users of BD xlator will be qemu-img utility. To create a BD
backed file on a GlusterFS volume, qemu-img has to issue glfs_create and
glfs_fsetxattr, but it doesn't look elegant. Idea is to provide a single
glfs call to create a posix file, BD and map that BD to the posix file.


  glfs_bd_creat: Create a posix file, BD and maps the posix file to BD
  in a BD GlusterFS volume.


  This function creates a posix file & BD and maps them. This interface
  takes care of the transaction consistency case where posix file creation
  succeeded but BD creation failed for whatever reason, created posix
  file is deleted to make sure that file is not dangling.


  @fs: The 'virtual mount' object to be initialized.

  @path: Path of the posix file within the virtual mount.

  @mode: Permission of the file to be created.

  @flags: Create flags. See open(2). O_EXCL is supported.


  NULL   : Failure. @errno will be set with the type of failure.
  @errno: EOPNOTSUPP if underlying volume is not BD capable.

  Others : Pointer to the opened glfs_fd_t.
struct glfs_fd * glfs_bd_create(struct glfs *fs, const char *path, int flags,
                     mode_t mode);

Also planning to provide glfs interfaces for other offload features of
BD such as snapshot, clone and merge. This API can be used to abstract
the steps involved in getting the gfid of the destination file and
passing it to the setfattr interface (optionally mode parameter can be
used to specify if the destination file has to be created, as of now bd
xlator code expects the destination file to exist for offload


  glfs_copy: Offloads copy operation between two files.


  This function optionally creates destination posix file and initiates
  server offloaded copy between them. Optionally based on
  the mode it could create destination file and issue glfs_{f}setxattr
  interface to do actual offload operation.


  @fs: The 'virtual mount' object to be initialized.

  @source: Path of the source file within the virtual mount.

  @dest: Path of the destination file within the virtual mount.

  @flag: Specifies if destination file need to be created or not.

  @mode: Permission of the destination file to be created.


  -1 : Failure. @errno will be set with the type of failure.
  0  : Success


int glfs_copy(struct glfs *fs, const char *source, const char *dest,
                        int mode);

int glfs_snapshot(struct glfs *fs, const char *source, const char *dest,
    int mode);
int glfs_merge(struct glfs *fs, const char *snapshot);

Upstream effort for server offloaded and copy on write:
Clone - offloaded copy:
FS Community already started discussing about the interfaces for
supporting server offloaded copy. Initially it started with adding a new
syscall 'copy_range' [https://patchwork.kernel.org/patch/2568761/] and
later the plan is to use existing splice system call itself to extend
copy between two regular files
[http://article.gmane.org/gmane.linux.kernel/1560133].  So is it safe to
assume that splice is the way for copy offload and add these FOPs to
GlusterFS(XFS, FUSE also) and support it in BD xlator?

Snapshot - reflink:
Also there is an upstream effort to provide interfaces for creating Copy
on Write files (ie snapshots in LVM terminlogy) using reflink syscall
interface, but its not merged in upstream [http://lwn.net/Articles/331808/]
This snapshot feature is supported by BRTFS and OCFS2 through ioctl
interface. Can we assume its the way for snapshot interface and add FOPs
similar to splice in GlusterFS stack?

There is no discussion happening about defining an interface for
snapshot merge. IIUC deleting a source file in BTRFS results in snapshot
merge. But in LVM, merging a snapshot results in snapshot LV getting
deleted. So can BD xlator also mimic this by merging the snapshot when
there is a request to remove the snapshot file? But if an user doesn't
want to merge but wants to remove the snapshot no way he can specify that.


But when upstream supports copy offload via splice & snapshot via
reflink syscalls these glfs interfaces become redundant and might needed
to be removed.

More information about the Gluster-devel mailing list