[Gluster-devel] File snapshot design propsals

Pranith Kumar Karampuri pkarampu at redhat.com
Thu Sep 8 07:59:42 UTC 2016


hi,
       Doing file-snapshots is becoming important in the context of
providing virtual block in containers so that we can take a snapshot of the
block device and switch to different snapshots etc. So there have been some
attempts at the design for such solutions. This is a very early look at
some of the solutions proposed so far. Please let us know what you think
about these and feel free to add any more solutions you may have for this
problem.

Assumptions:
- Snap a single file
- File is accessed by a single client (VM images, block store for container
etc.)
As there is a single client that accesses the file/image, the file
read/write (or other modification FOPs) can act on a version number of the
file (read as a part of lookup say, and communicated to other FOPs as a
part of the xdata).

1) Doing file snapshot using shards: (This is suggested by shyam, tried to
keep the text as is)
If a block for such a file is written to with a higher version then the
brick xlators can perform a block copy and then change the new block to the
new version, and let the older version be as is.

This means, to snap such a file, just the first shard needs a higher
version # and the client that is operating on this file needs to be updated
with this version (mostly the client would be the one that is taking the
snap, but even other wise). To update the client we can leverage the
granted lease, by revoking the same, and forcing the client to reacquire
the lease by visiting the first shard (if we need to coordinate the client
writes post the snap this maybe sort of a must).

Anyway, bottom line is, a shard does not know a snap is taken, rather when
a data modification operation is sent to the shard, it then acts on
preserving the older block.

This leaves blocks with various versions on disk, and when a older snap
(version) is deleted, then the corresponding blocks are freed.

A sparse block for a version never exists in this method, i.e when taking a
snap, if a shard did not exist, then there is no version for it that is
preserved, and hence it remains a empty/sparse block etc.

Pros: good distribution of the shards across different servers and
efficient usage of the space available
Cons: Difficult to give data locality for the applications that may demand
it.

2) Doing a file snapshot using sparse files:
This is sort of inspired from granular data self-heal idea we wanted to
implement in afr, where we logically represent each block/shard used in the
file by a bitmap stored either as an xattr or written to a metafile. So
there is no physical division of the file into different shards. When a
snapshot is taken, a new sparsefile is created of same size as before, new
writes on the file are redirected to this file instead of the original
file, thus preserving the old file. When a write is performed on this file,
we mark which block is going to be written, copy out this block from older
shard, overwrite the buffer and then write to the new version and mark the
block as used either in xattr/metafile.

Pros: Easier to give data locality for the applications that may demand it.
Cons: in-efficient usage of the space available, we may end up with uneven
usage among different servers in the cluster.

3) Doing filesnapshots by using reflink functionality given by the
underlying FS:
When a snapshot request comes, we just do a reflink of the earlier file to
the latest version and new writes are redirected to this new version of the
file.

Pros: Easiest to implement among all the three, easier to give data
locality for the applications that may demand it.
Cons: FS specific, i.e. not going to work on disk Filesystems that don't
support file-snapshots, this too has the same problem as we have in 2)
above i.e. in-efficient usage of the space available, we may end up with
uneven usage among different servers in the cluster.

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160908/3ec8c31d/attachment.html>


More information about the Gluster-devel mailing list