[Gluster-devel] trusted.glusterfs.version xattr

Derek Price derek at ximbiot.com
Thu May 8 17:21:20 UTC 2008


Martin Fick wrote:
> --- Martin Fick <mogulguy at yahoo.com> wrote:
>>   Original creation process and versioning:
>>
>>   /
>>  v1
>>   /dir1/
>>  v2   v1
>>   /dir1/dir2/
>>  v2   v2   v1
>>   /dir1/dir2/file
>>  v2   v2   v2  v1
>>
>> Mirror goes off-line with version #s of dir2 and
>> file as: v2/v1.
>>
>> -> file deleted
>>
>>   /dir1/dir2/
>>  v2   v2   v3
>>
>> -> dir2 deleted
>>
>>   /dir1/
>>  v2   v3
>>
>> -> dir2 recreated
>>
>>   /dir1/dir2/
>>  v2   v4   v1
>>
>> -> file recreated
>>
>>   /dir1/dir2/file
>>  v2   v4   v2   v1
> ...
>> However, if we were looking at the versions all the 
>> way to the root, when the mirror went off-line we 
>> would have had: /v2/v2/v2/v1 and now we have: 
>> /v2/v4/v2/v1.  There is a chance that we are 
>> talking about different files now.  Of course, the
>> problem I see now is that the files could in fact 
>> have been the same even though the version number is
>> different with this scheme!  Since the only version
>> # that is different is that of dir1 (v4), this could
>> have been caused by simply adding two new files to 
>> that directory!  
> 
> Hmm, I think that my logic may have been flawed here
> and that the scheme would actually work (as long as 
> you go to the root).  The mismatch above would only
> exist if in fact the file had been recreated!  If the
> file had not been recreated, its version # would still
> be /v2/v2/v2/v1 and even though if you were to
> recalculate it now it would yield /v2/v4/v2/v1.  But
> we are not recalculating it, we are trying to see
> if the files on two subnodes were created at the 
> same time, and thus the version history should have
> been the same right?
> 
> This assumption only holds if the parent directories
> all the way to root are healed before a file is
> created/modified though.  I am, not sure that it
> currently does with AFR? Does it?  
> 
> If the parent directories (all the way up) are not
> healed, then a version mismatch could be created 
> when a file is modified and its version is updated.  
> In this case, despite the version mismatches, the 
> files are in fact the same.  It does not seem like 
> it would be too difficult to force the parent
> directories to heal before writing to the file. 
> Unless, a directory heal causes all changed file 
> data (or just new files+data?) in those directories 
> to heal, that could be a long delay.  Thoughts?  I 
> must admit, I am having a hard time following all 
> these constraints. :)  ... If this works, no 
> useless resyncing because we thought that files 
> have changed as I previously surmised.

If you increment directory version numbers on all directory listing 
changes, I still see a major problem:

1.  Adding, renaming, or removing a file or directory in ANY directory 
now cascades the version number change up to the root directory, 
effectively incrementing the version number of ALL files and marking 
them as dirty/needing update to all other servers.  I hope you agree 
this is Very Bad (tm).  You could solve it with checksums, but as 
someone pointed out, that could get expensive, even with a checksum 
cache, when the entire tree needs to be checked every time.

I believe that this cascade and healing is necessary is illustrated in 
the following example:  given a synchronized /a/b/c/file, against server 1:

	$ cd /
	$ mv a z
	$ mkdir -p a/b/c
	$ echo whatever >file

Then, against server 2:

	$ cat /a/b/c/file

Would have to know to heal directory listings all the way up to its root 
directory listing to give the correct answer here.

I think the single, global version number I mentioned in the "Client 
side AFR race conditions" provides an interesting solution here. 
Consider the following commands and their corresponding file system 
states starting with an empty root.  In this model, changing the 
content/version number of any child element is considered to change the 
directory listing of the parent, and renames update the version number 
of all children of the renamed element:

/			v1

	$ mkdir /a
/			v2
/a			v2

	$ mkdir /b
/			v3
/a			v2
/b			v3

	$ echo whatever > /a/1
/			v4
/a			v4
/a/1			v4
/b			v3

	$ echo whatever > /a/2
/			v5
/a			v5
/a/1			v4
/a/2			v5
/b			v3

	$ mv /a /z
/			v6
/b			v3
/z			v6
/z/1			v6
/z/2			v6

	$ rm /z/2
/			v7
/b			v3
/a			v7
/a/1			v6

This glosses over the locking issues we were discussing in the other 
thread, but in this model, a client can quickly determine whether its 
copy of any directory listing or file is up to date based on solely that 
file or directory's own version number (locally and on the server), and 
giving a parent directory a new version number does not invalidate the 
data of all its children.

Regards,

Derek
-- 
Derek R. Price
Solutions Architect
Ximbiot, LLC <http://ximbiot.com>
Get CVS and Subversion Support from Ximbiot!

v: +1 248.835.1260
f: +1 248.246.1176





More information about the Gluster-devel mailing list