[Gluster-devel] Versioning

Thu Aug 2 15:12:21 UTC 2012

On Wed, Jul 25, 2012 at 11:20 PM, Fred van Zwieten
<fvzwieten at vxcompany.com>wrote:

> "Now I am leaning towards git based versioning. Integrate git into
> GlusterFS to track changes on specified events (timer, file-close,
> dir-tree-modify..). We may not do this via translator interface, but
> through the newly proposed simple event/timer interface. "
>
> I am not sure I would like that. Our idea is to make the previous versions
> (read-only!) available to the end-users through a separate mount-point,
> taking file permissions into account. I am not sure if that is at all
> possible when they live inside a git repository.
>
> (disclaimer: I do not know the inner workings of glusterfs
> nor translators) I would think making it part (of the receiving part)
> of geo-replicator translator would be ideal because it knows what is going
> on. If a file /a/b/c is updated it's previous version could be stored as
> /pre/a/b/c.<datetime> or /pre/<datetime>/a/b/c. If the previous versions
> live on the same file-system you could even play with inodes to keep only
> the previous versions of blocks. This would make it very space efficient
> (sort of file based snapshotting).
>
> I do agree that using git makes it more modular and independent of
> the geo-replicator translator.
>
> I am also curious how you would handle multiple writes in a short time to
> the same file without ending up with an equal amount of previous versions.
>
> Also, I can't find the note you are referring to. Could you please make a
> feature wiki page using the template?
>
> Fred
>

We broke GeoReplication into two parts: (1) Marker - change tracking
translator and (2) a simple queue - query changes and invoke rsync with
specific list of files over ssh.  Unlike inotify, marker framework keeps
track of changes with in the filesystem as xtime in extended attributes.
You can ask the filesystem to list all changed files and folders since a
particular point in time. This way, external service can tolerate crash,
WAN link failure, etc.  Marker allows developers to extend storage
capabilities using simple application programming model (even scripting
languages are OK).

If certain tasks can be achieved outside of a translator, it is good to do
so. Just like kernel mode , translator mode has some limitations.
Translator code has to be efficient, asynchronous (event driven), latency
sensitive and free of memory leaks.  If we extend the marker framework idea
into generic event hook mechanism, we can develop powerful storage
applications outside of the translator mechanism. Say you register your
tool or script for certain events. When the event occurs, your code gets
invoked with necessary parameters. You could then operate on the mounted
filesystem itself, just as any other application. For example, you register
a git script for invocation on a event say "when ever a registered
directory tree is modified and time elapsed more than 30 mins". All this
script does is, push changes to external origin. It is crude and simple,
but achieves the goal.  Simple is better. You may also develop anti-virus
plugins or silent data corruption checks using this technique. Users can
use simple git checkout for flip views. Because git doesn't scale for large
content, you can limit users to explicitly register interested folders for
versioning. If you want to create a mountable of remote content, you can
write a translator to  trap chdir or lookup for a directories named after
timestamp and perform git checkout. If I use git for continuous automated
file system versioning,  I will suggest users to use git tool itself as the
UI.

I am just giving you tips and suggestions. Don't limit your ideas any way.

If I am guessing your idea correctly, it will have few limitations, but can
live with it.

 * Only files are versioned. Directories are not.
 * File renames and Directory renames (mv) are not supported.
 * Every version is a complete duplicate copy (not as COW or WAFL).
 * Changes are tracked at per file level. Changes across a directory tree
are not grouped. I mean cvs style, not like git as a patch set.

It is actually OK to make duplicate copies of changed files. In reality,
for most practical use cases, very few files across the name space gets
modified. Most of the files are written once and rarely modified. Files
older than 30 days are hardly accessed. So it is OK to store duplicate
copies of just the changed files. btrfs or device-mapper dedup may may take
care of this as well. I won't worry too much about duplicating data, given
its very small proportion.
I didn't quite understand how you can play with inodes to avoid this
duplication. Did you mean btrfs dedup like capability?.

If you want to avoid these limitations, think about rdiff-backup style
continuous automated backup. Just like georep, you monitor the filesystem
for changes and backup on a continuous basis. It is OK to give users a tool
or API to restore/view older versions. This is much simpler to implement
than WAFL or COW style storage format and file level snapshoting.

Anand,

These are all "design" decisions that we do not need and even make the
product less usefull in our use-case.

We have a large archive of tiff files. Every tiff file is large (50+ mb).
The images themselves do not get modified, but their EXIF metadata does.
There are also file renames and they get re-arranged into different
directory structures. For this archive we need scalable filesystem with
georep to second location _and_ file versioning.

"Because git doesn't scale for large content, you can limit users to
explicitly register interested folders for versioning"

Now, it seems to me git does not fit this bill, because it doesn't scale
very well.

"* File renames and Directory renames (mv) are not supported"

If you mean building up retention on file renames and moves i agree for our
use-case, but other might need it. Look at backuppc for a cool solution on
that.

"* Every version is a complete duplicate copy (not as COW or WAFL)."

The fact that each version is a complete duplicate is not very storage
friendly, because in out use-case only the EXIF metadata changes. I seek
rdiff-backup like functionality there.

"It is actually OK to make duplicate copies of changed files. In reality,
for most practical use cases, very few files across the name space gets
modified. Most of the files are written once and rarely modified. Files
older than 30 days are hardly accessed. So it is OK to store duplicate
copies of just the changed files. btrfs or device-mapper dedup may may take
care of this as well. I won't worry too much about duplicating data, given
its very small proportion. "

I do not agree with you. If you say most of the files are written once and
rarely modifed you are narrowing the usecase for glusterfs. You are
describing near worm. Out use-case is not that. Also, our files also get
modified after 30 days. Relying on dedup on the lower fs level is also not
good. Suppose you have a 200TB filesystem. That would take post-proces
dedup a very long time to find the dups. Better to do it inline. Again,
look a backuppc for an implementation example.

Fred
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20120802/599906be/attachment-0003.html>