[Gluster-devel] file version on glusterfs using libgit

Luis Pabon lpabon at redhat.com
Fri Mar 8 17:22:50 UTC 2013


This sounds really interesting, but I do have some questions about Git 
(or any SCM) as a solution for file version support.

1. How well does Git handle large binary files like VM images?  Does it 
keep a copy for each one, or does it keep diffs?
2. Does Git, or another SCM, allow for the deletion of older versions?
3. Can we this solution be used for VM linked clones? (I guess that 
would be like branching each one).

This is really interesting, because Brian F. and I were just discussing 
the pluses and minuses of a file version solution, but instead using 
QEMU's block driver technology, specifically either QCOW2 or QED 
(leaning more to QED).

Maybe what we are describing here is two different implementations for 
two different use cases.

File Versioning?:
1. Google Drive/Dropbox style file versions for small documents and 
files (still a question on binary deltas), where older versions are 
never deleted.
--> Solution: Git translator

File Snapshots?:
2. Snap support for small or large files which may require the deleting 
and/or merging of different versions. Specifically, satisfying APIs like 
OpenStack Cinder Snapshot API (available in Grizzly [1]) and linked 
virtual machine clones.
--> Possible Solution:  QEMU Block Technology (Still under investigation)

Also, I am not sure, but this type of translator could be better being 
at the client (behind DHT) than at the server (behind POSIX). I am still 
new to GlusterFS, but I am guessing that the .git repo (which is 
probably not be able to be seen by the client) would be handled by only 
one of the GlusterFS hosts.  This could create a bottleneck.  If 
instead, the xlator was at the client, then the files would be spread 
over the cluster (even the .git repo) by the DHT xlator.  There may be a 
need to do some type of locking, but I am guessing GlusterFS already 
handles much of that.  This issue parallels discussions Brian and I had 
around a QED based translator and how it would handle the IO for >100 
linked cloned virtual machines.

But like I said above... Definitely very cool stuff.

- Luis

PS. another possible solution:  **IF** we had a deduplicating backend 
(xlator or file system), then we could just make a copy (although it 
could be slow) and be done with it :-).

[1] https://wiki.openstack.org/wiki/Cinder

On 03/08/2013 06:20 AM, Niels de Vos wrote:
> On Fri, Mar 08, 2013 at 06:00:24AM -0500, Shishir Gowda wrote:
>> Hi Niels,
>>
>> Thinking out aloud, I think the snaps(in file version context) can be
>> displayed as branches (list).
> Well, I am not sure if branches are really needed. Isn't linear history
> sufficient? Every change should be committed to the master branch
> anyway. Branches may be useful for switching between versions, but
> nothing prevents you from checking out (or "git ls-files") with a comit
> or a date.
>
> I'm thinking of a virtual .snaps directory:
>
> $ cd $VOLUME/.snaps
> $ ls
> 2013-03-07/
> 2013-03-06/
> .....
> current/
> changelog
> yesterday -> 2013-03-07/
>
> This makes it possible to do something like:
> $ cat changelog
>     - virtual file, showing the contents of 'git log'
>     - find a commit you're interested in
> $ mkdir $GIT_COMMIT_ID
> $ ls $GIT_COMMIT_ID/
>     - get the state just like 'git checkout $GIT_COMMIT_ID'
>
> Maybe it would be helpful to be able to create tags inside this .snaps
> directory. But I would refrain from branches for now (unless there is
> a clear use-case).
>
> Cheers,
> Niels
>
>> Once the user cd's into any one of them, we could do a git checkout of the branch.
>>
>> That should mimic the behaviour.
>>
>> With regards,
>> Shishir
>>
>> ----- Original Message -----
>> From: "Shishir Gowda" <sgowda at redhat.com>
>> To: "Niels de Vos" <ndevos at redhat.com>
>> Cc: gluster-devel at nongnu.org
>> Sent: Friday, March 8, 2013 4:04:42 PM
>> Subject: Re: [Gluster-devel] file version on glusterfs using libgit
>>
>> Hi Niels,
>>
>> My inclination too is to load git ontop of posix xlator.
>>
>> I was thinking of making previous versions (based on some policy) to be treated a new branch.
>>
>> We could see how to export these branches as user visible dirs.
>>
>> With regards,
>> Shishir
>>
>> ----- Original Message -----
>> From: "Niels de Vos" <ndevos at redhat.com>
>> To: "Shishir Gowda" <sgowda at redhat.com>
>> Cc: gluster-devel at nongnu.org
>> Sent: Friday, March 8, 2013 3:25:57 PM
>> Subject: Re: [Gluster-devel] file version on glusterfs using libgit
>>
>> On Thu, Mar 07, 2013 at 12:54:41AM -0500, Shishir Gowda wrote:
>>> Hi All,
>>>
>>> Was playing around with git on glusterfs volume, to provide was of file version support.
>>>
>>> And initial run is encouraging.
>>>
>>> A brief overview what was tried:
>>>
>>> Approach 1: Glusterfs volume as a git repo
>>>
>>> 1. created a 2 brick distribute volume
>>> 2. inited a git repo on fuse volume
>>> 3. created files, committed them in git.
>>> 4. Modified files, and committed them again
>>> 5. Did branch check-outs, to simulate versions @ point in time
>>> 6. reset branch heads, and was able access older version of files (after a stash).
>>> 7. Was able to create files/dirs/symlinks/hardlinks
>>> 8. Both NFS/FUSE clients were used.
>>>
>>> Approach 2: Glusterfs bricks as git repo's
>>>
>>> 1. created a 2 brick distribute volume
>>> 2. inited git repo on brick1
>>> 3. inited git repo on brick2
>>> 4. created files, committed the relevant brick's git.
>>> 5. Modified files, and committed them again on brick's git
>>> 6. Did branch check-outs, to simulate versions @ point in time on individual bricks
>>> 7. reset branch heads, and was able access older version of files (after a stash).
>>> 8. Was able to create files/dirs/symlinks/hardlinks
>>> 9. Both NFS/FUSE clients were used.
>>>
>>> Buoyed by this, will start prototyping integration of libgit2 as xlator for file version support.
>>>
>>> There are 3 approaches to consider:
>>>
>>> 1. Load git xlator on clients volfiles
>>> 2. Load git xlator on server volfiles
>>> 3. Replace posix interface with git interface.
>>>
>>> Please provide feedback, on what would be more desirable.
>> Very interesting! Option 2 makes most sense to me, the posix xlator
>> contains some access checks and such, which you probably should not need
>> to duplicate.
>>
>> Have you thought about making the previous version accessible through
>> the glusterfs/nfs mount? Other vendors seem to have a .snapshot
>> directory with previous versions, would something like that be possible?
>> Users would be able to recover deleted files themselves that way.
>>
>> Also, I do not know if git stores xattrs and their changes...
>>
>> Cheers,
>> Niels
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> https://lists.nongnu.org/mailman/listinfo/gluster-devel





More information about the Gluster-devel mailing list