[Gluster-devel] Snapshot design for glusterfs volumes

Mon Aug 19 01:48:20 UTC 2013

Hi Shishir,
     Thank you for sending out your paper.  Here are some comments I 
have (written in markdown format):

# Review of Online Snapshot Support for GlusterFS

## Section: Introduction
* Primary use case should have a better explanation.  It does not 
explain how the user currently compensating for not currently having the 
technology in their environment, nor the benefits of having the feature.
* Last sentence should explain why it is the same.  Why would it be?  No 
benefits can be gained from having this feature for non-vm image 
environments?  If not, then the name should be changed to vmsnapshots or 
something that discourages usage in environments other than VM image 
storage.

## Section: Snapshot Architecture
* The architecture section does not talk about architecture, but instead 
focuses on certain modes of operation. Please explain how a user from 
either a client or something like OpenStack interface interact with the 
snapshots.  Also describe in good detail all aspects of operation 
(delete,create,etc.).  Describe here the concept of Barriers instead of 
at the end of the document.
* I'm new to GlusterFS, but I am confused on what is meant by bullet #3: 
"The planned support is for GlusterFS Volume based snapshots...".  Seems 
like the sentence is not finished.  Do you mean "The planned support is 
for snapshots of GlusterFS volumes..."?  Also, how is brick coherency 
kept across multiple AFR nodes?
* Snapshot Consistency section is confusing, please reword the 
description.  Maybe change the format to paragraphs instead of bullets
* Please explain why there is a snapshot limit of 256.  Are we using 
only one byte for tracking a snapshot id?
* When the CLI executes multiple volume snapshots, is it possible to 
execute them in parallel?  Why do they need to be serially processed?
* What happens when `restore` is executed?  How does the volume state 
change?  Does the .gluster directory change in any way?
* What happens when `delete` is executed?  When we have the following 
snaps `A->B->C->D`, and we delete `B`, what happens to the state of the 
volume?  Do the changes from `B` get merged to `A` so that it provided 
the dependencies needed by `C`?
* Using the example above, can I branch or clone from `B` to `B'` and 
create a *new* volume?  I am guessing that the LVM technology would 
probably not allow this, but maybe btrfs would.

## Section: Data Flow
* This section is confusing.  Why are they bullets if they read as a 
sequence?  This seems to me more like a project requirements list than a 
data flow description.
* What are the side effects of acquiring the cluster wide lock? What 
benefits/concerns should it have on the system with N nodes?
* What is the average amount of time the CLI will expect to be blocked 
before it returns?
     * I am not sure if we have something like this already, but we may 
want to discuss the concept of a JOB manager.  For example, here the CLI 
will send a request which may take longer than 3 secs.  In such a 
situation, the CLI will be returned a JOB ticket number.  The user can 
then query the JOB manager and provide the ticket number for status, or 
provide a callback mechanism (which is a little harder, but possible to 
do).  In any case, I think this JOB manager falls outside the scope of 
this paper, but is something we should revisit if we do not already posses.
* The bullet "Once barrier is on, initiate back-end snapshot." should 
explain in greater detail what is meant by "back-end snapshot".

## Section: CLI Interface
* Each one of these commands should be explained in the architecture 
section in fine detail on how they affect volume state changes and side 
effects.

## Section: Snapshot Design
* Does the amount of content in a brick affect the create, delete, list, 
or restore snapshot time?
* The paper only describes `create` in the first part of the section.  
There probably should be a subsection for each of the commands 
supported, each describing in detail how they are planned to be implemented.
* Could there be a section showing how JSON/XML interfaces would be 
supporting this feature?

### Subsection: Stage-1 Prepare
* Are barriers on multiple bricks executed serially?  What is the 
maximum number of bricks supported by the snapshot feature before taking 
an unusual amount of time to execute?  Should brick barriers be done in 
parallel?
* This again seems like a requirement list and sometimes like a 
sequence.  Please reword section.

## Section: Barrier
* Paragraph states "unless Asynchronous IO is used".  How does that 
affect the barrier and snapshots?  Paper does not describe this situation.
* A description of the planned Barrier design will help understand what 
is meant by queuing of fops.
* Will the barrier be implemented as a new xlator which will be 
interested on the fly when a snapshot is requested, or will it require 
changes to existing xlators?  If it is not planned to be a xlator, 
should it be implemented as such to provide code isolation?
* Why are `write` and `unlink` not fops to be barriered?  Barriers still 
allow disk changes?  Maybe the paper should describe why it allows 
certain calls to affect the disk and how these changes may or may not 
affect the snapshot or the volume state.

## Section: Snapshot management
* Item `#2` is confusing, please reword.
* Item `#3` says that individual snapshots will not be supported. If 
that is true, then what does `delete` do?
* Item `#7` is confusing.  Please reword.  The paper should state why 
the user and developer need to know this information.
* Item `#8` is confusing.  Is the item stating that the user can only do 
certain commands on a volume snapshot restore?  If this is true, are 
volume snapshot restores not a true volume restore where the volume is 
back to a previous state?  What is the benefit of this feature to the user?
* Item `#9` seems like an outline for the `delete` design.  There needs 
to be more information here in greater detail as discussed above.
* Item `#10` needs to describe why it is proposing that a restored 
snapshot is shown as a snap shot volume.  Is a volume with snapshot not 
identified as a snap volume also?

## Section: Error Scenarios
* Please reword item `#3`.

## Section: Open-ended issues
* Item `#4` is confusing.  Please reword.
* Item `#6` suggests that snapshot volumes can be mounted.  Can a 
snapshot *and* the latest volume be mounted at the same time?  If the 
volume is `reverted` to a previous snapshot so that the user can inspect 
the volume state, I highly suggest on keeping all snapshot mounts as 
Read-Only.  If the user wants to write to that mount, they should delete 
all snapshots to that point.  I highly discourage this feature from 
dealing with merges.
* Item `#8` does not describe what will happen if a re-balance is 
initiated.  Will snaps be deleted?  I do not think these constraints are 
a good alternative.  In my opinion, the snapshot features should support 
all GlusterFS high availability features.
* Item `#9` does not describe what the `master` volume is.  Does it mean 
what the user cannot revert to a previous snapshot?  If this is true, 
does that not violate the original requirement?

## Section: Upgrade/Downgrade
* This section describes that snap state will be maintained in 
`/var/lib/glusterd...`.  The paper needs to describe snapshot state in 
greated detail in the `Design` section.  For example, what state is kept 
in `/var/lib/glusterd...` and what state is read from the underlying 
file system snapshot technology?  What happens when the underlying file 
system snapshot technology has one state and `/var/lib/glusterd...` has 
another?

Look forward to your reply.

- Luis

On 08/02/2013 02:26 AM, Shishir Gowda wrote:
> Hi All,
>
> We propose to implement snapshot support for glusterfs volumes in release-3.6.
>
> Attaching the design document in the mail thread.
>
> Please feel free to comment/critique.
>
> With regards,
> Shishir
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130818/6bc5d8aa/attachment-0001.html>