[Gluster-devel] Snapshot design for glusterfs volumes
Luis Pabon
lpabon at redhat.com
Mon Aug 19 01:48:20 UTC 2013
Hi Shishir,
Thank you for sending out your paper. Here are some comments I
have (written in markdown format):
# Review of Online Snapshot Support for GlusterFS
## Section: Introduction
* Primary use case should have a better explanation. It does not
explain how the user currently compensating for not currently having the
technology in their environment, nor the benefits of having the feature.
* Last sentence should explain why it is the same. Why would it be? No
benefits can be gained from having this feature for non-vm image
environments? If not, then the name should be changed to vmsnapshots or
something that discourages usage in environments other than VM image
storage.
## Section: Snapshot Architecture
* The architecture section does not talk about architecture, but instead
focuses on certain modes of operation. Please explain how a user from
either a client or something like OpenStack interface interact with the
snapshots. Also describe in good detail all aspects of operation
(delete,create,etc.). Describe here the concept of Barriers instead of
at the end of the document.
* I'm new to GlusterFS, but I am confused on what is meant by bullet #3:
"The planned support is for GlusterFS Volume based snapshots...". Seems
like the sentence is not finished. Do you mean "The planned support is
for snapshots of GlusterFS volumes..."? Also, how is brick coherency
kept across multiple AFR nodes?
* Snapshot Consistency section is confusing, please reword the
description. Maybe change the format to paragraphs instead of bullets
* Please explain why there is a snapshot limit of 256. Are we using
only one byte for tracking a snapshot id?
* When the CLI executes multiple volume snapshots, is it possible to
execute them in parallel? Why do they need to be serially processed?
* What happens when `restore` is executed? How does the volume state
change? Does the .gluster directory change in any way?
* What happens when `delete` is executed? When we have the following
snaps `A->B->C->D`, and we delete `B`, what happens to the state of the
volume? Do the changes from `B` get merged to `A` so that it provided
the dependencies needed by `C`?
* Using the example above, can I branch or clone from `B` to `B'` and
create a *new* volume? I am guessing that the LVM technology would
probably not allow this, but maybe btrfs would.
## Section: Data Flow
* This section is confusing. Why are they bullets if they read as a
sequence? This seems to me more like a project requirements list than a
data flow description.
* What are the side effects of acquiring the cluster wide lock? What
benefits/concerns should it have on the system with N nodes?
* What is the average amount of time the CLI will expect to be blocked
before it returns?
* I am not sure if we have something like this already, but we may
want to discuss the concept of a JOB manager. For example, here the CLI
will send a request which may take longer than 3 secs. In such a
situation, the CLI will be returned a JOB ticket number. The user can
then query the JOB manager and provide the ticket number for status, or
provide a callback mechanism (which is a little harder, but possible to
do). In any case, I think this JOB manager falls outside the scope of
this paper, but is something we should revisit if we do not already posses.
* The bullet "Once barrier is on, initiate back-end snapshot." should
explain in greater detail what is meant by "back-end snapshot".
## Section: CLI Interface
* Each one of these commands should be explained in the architecture
section in fine detail on how they affect volume state changes and side
effects.
## Section: Snapshot Design
* Does the amount of content in a brick affect the create, delete, list,
or restore snapshot time?
* The paper only describes `create` in the first part of the section.
There probably should be a subsection for each of the commands
supported, each describing in detail how they are planned to be implemented.
* Could there be a section showing how JSON/XML interfaces would be
supporting this feature?
### Subsection: Stage-1 Prepare
* Are barriers on multiple bricks executed serially? What is the
maximum number of bricks supported by the snapshot feature before taking
an unusual amount of time to execute? Should brick barriers be done in
parallel?
* This again seems like a requirement list and sometimes like a
sequence. Please reword section.
## Section: Barrier
* Paragraph states "unless Asynchronous IO is used". How does that
affect the barrier and snapshots? Paper does not describe this situation.
* A description of the planned Barrier design will help understand what
is meant by queuing of fops.
* Will the barrier be implemented as a new xlator which will be
interested on the fly when a snapshot is requested, or will it require
changes to existing xlators? If it is not planned to be a xlator,
should it be implemented as such to provide code isolation?
* Why are `write` and `unlink` not fops to be barriered? Barriers still
allow disk changes? Maybe the paper should describe why it allows
certain calls to affect the disk and how these changes may or may not
affect the snapshot or the volume state.
## Section: Snapshot management
* Item `#2` is confusing, please reword.
* Item `#3` says that individual snapshots will not be supported. If
that is true, then what does `delete` do?
* Item `#7` is confusing. Please reword. The paper should state why
the user and developer need to know this information.
* Item `#8` is confusing. Is the item stating that the user can only do
certain commands on a volume snapshot restore? If this is true, are
volume snapshot restores not a true volume restore where the volume is
back to a previous state? What is the benefit of this feature to the user?
* Item `#9` seems like an outline for the `delete` design. There needs
to be more information here in greater detail as discussed above.
* Item `#10` needs to describe why it is proposing that a restored
snapshot is shown as a snap shot volume. Is a volume with snapshot not
identified as a snap volume also?
## Section: Error Scenarios
* Please reword item `#3`.
## Section: Open-ended issues
* Item `#4` is confusing. Please reword.
* Item `#6` suggests that snapshot volumes can be mounted. Can a
snapshot *and* the latest volume be mounted at the same time? If the
volume is `reverted` to a previous snapshot so that the user can inspect
the volume state, I highly suggest on keeping all snapshot mounts as
Read-Only. If the user wants to write to that mount, they should delete
all snapshots to that point. I highly discourage this feature from
dealing with merges.
* Item `#8` does not describe what will happen if a re-balance is
initiated. Will snaps be deleted? I do not think these constraints are
a good alternative. In my opinion, the snapshot features should support
all GlusterFS high availability features.
* Item `#9` does not describe what the `master` volume is. Does it mean
what the user cannot revert to a previous snapshot? If this is true,
does that not violate the original requirement?
## Section: Upgrade/Downgrade
* This section describes that snap state will be maintained in
`/var/lib/glusterd...`. The paper needs to describe snapshot state in
greated detail in the `Design` section. For example, what state is kept
in `/var/lib/glusterd...` and what state is read from the underlying
file system snapshot technology? What happens when the underlying file
system snapshot technology has one state and `/var/lib/glusterd...` has
another?
Look forward to your reply.
- Luis
On 08/02/2013 02:26 AM, Shishir Gowda wrote:
> Hi All,
>
> We propose to implement snapshot support for glusterfs volumes in release-3.6.
>
> Attaching the design document in the mail thread.
>
> Please feel free to comment/critique.
>
> With regards,
> Shishir
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130818/6bc5d8aa/attachment-0001.html>
More information about the Gluster-devel
mailing list