[Gluster-devel] Glusterfs Snapshot feature
pcuzner at redhat.com
Sun Dec 22 22:20:45 UTC 2013
I've just read through the snapshot spec on the wiki page in the forge, and have been looking at it through a storage admin's eyes.
There are a couple of items that are perhaps already addressed but not listed on the wiki, but just in case here's my thoughts;
The CLI definition doesn't define a snap usage command - ie. a way for the admin to understand which snap is consuming the most space. With snaps comes misuse, and capacity issues, so our implementation should provide the admin with the information to make the right 'call'.
I think I've mentioned this before, but how are snaps orchestrated across the cluster. I see a CLI to start it - but what's needed is a cron like ability associated per volume. I think in a previous post I called this a "snap schedule" command.
A common thing I've seen on other platforms is unexpected space usage, due to changes in data access patterns generating more delta's - I've seen this a lot in virtual environments, when service packs/maintenance gets rolled out for example. In these situations, capacity can soon run out, so an auto-delete feature to drop snaps to ensure the real volume stays on line would seem like a sensible approach.
The comments around quorum and replica 3 enable an exception to a basic rule - fail a snap request if the cluster is not in a healthy state. I would argue against making exceptions and keep things simple - if a node/brick is unavailable, or there is a cluster reconfig in progress, raise an alert and fail the snap request. I think this is a more straight forward approach for admins to get their heads around than thinking about specific volume types and potential cleanup activity.
It's not clear from the wiki how snaps are deleted - for example, when the snap-max-limit is reached, does the create of a new snapshot automatically trigger the delete of the oldest snap? If so presumably the delete will only be actioned, once the barrier is in place across all affected bricks.
The snap create is allowing a user-defined name, which is a great idea. However, what is the order that the snaps would be resolved in when the user opens the .snap directory? Will create time for the snapshot be the order the user sees regardless of name, or is there a potential of a name for a more current time appearing lower in the list causing a potential recovery point to be missed by the end user?
snap restore presumably requires the volume to be offline, and the bricks unmounted? There's no detail in the scoping document about how the restore capability is intended to be implemented. Restore will be a drastic action, so understanding the implications to the various protocols smb,nfs,native, swift and gfapi would be key. Perhaps the simple answer is the volume must be in a stopped state first?
There is some mention of a phased implementation for snapshots - phase-1 is mentioned for example towards the end of the doc. Perhaps it would be beneficial to define the phases at the start of the article, and list the features likely to be in each phase. This may help focus feedback specific to the initial implementation for example.
Like I said - I'm looking at this through the eyes of an "old admin" ;)
----- Original Message -----
> From: "Rajesh Joseph" <rjoseph at redhat.com>
> To: "gluster-devel" <gluster-devel at nongnu.org>
> Sent: Friday, 6 December, 2013 1:05:29 AM
> Subject: [Gluster-devel] Glusterfs Snapshot feature
> Hi all,
> We are implementing snapshot support for glusterfs volumes in release-3.6.
> The design document can be found at
> The document needs some update and I will doing that in the coming weeks.
> All the work done till now can be seen at review.gluster.com. The project
> name for the snapshot
> work is "glusterfs-snapshot"
> All suggestions and comments are welcome.
> Thanks & Regards,
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
More information about the Gluster-devel