[Gluster-devel] trusted.glusterfs.version xattr
Martin Fick
mogulguy at yahoo.com
Thu May 8 18:40:25 UTC 2008
--- Derek Price <derek at ximbiot.com> wrote:
> If you increment directory version numbers on all
> directory listing changes, I still see a major
> problem:
>
> 1. Adding, renaming, or removing a file or
> directory in ANY directory now cascades the version
> number change up to the root directory,
No, there is no need to cascade a version change
up the chain. What purpose would that serve?
I was not suggestion this, only that when
assigning a version # to a directory/file
to be sure to include all the version#s of the
parents so that we can be sure we are talking
about the same version of the element when it
was created/edited.
In fact, I now realize that this constraint
could even be relaxed to simply ensure that
the creation version has every parents'
versions at the time of creation. When
the file is updated, there is no need
to update the version # to the any current
parents' versions (naturally the parents
must be healed though), simply bump the
tail of the version # (the file portion),
the rest of the version # can stay the
same and need no longer match what the
parents' versions are. This makes things
quicker for file modifications.
> effectively incrementing the version
> number of ALL files and marking them
> as dirty/needing update to all other
> servers.
No no, they are not dirty simply because
the parent version # have changed. This
was the false conclusion that I originally
made. You don't care if any of the
parents have changed as long as you
are talking about the same file which
will be reflected in the parents'
versions when the file was created!
Think of the parents' portion of the
version # as just a unique ID chosen on
file creation. The parents can change
all they want, but if this unique ID
hasn't changed on either server, we are
talking about the same file. If only
the file portion changes, we just have
a different version of the same file
and it is a candidate for extent based
quick healing.
> I believe that this cascade and healing is necessary
> is illustrated in
> the following example: given a synchronized
> /a/b/c/file, against server 1:
OK, to get to this point, the version graph
I am suggestion would look like this on both
servers (minimal version #s, they could
naturally be higher if other events occurred):
/ -> /a/ -> /a/b/ -> /a/b/c/ -> /a/b/c/file
/:v1 /:v2 /:v2 /:v2 /:v2
a:v2/1 a:v2/2 a:v2/2 a:v2/2
b:v2/2/1 b:v2/2/2 b:v2/2/1
c:v2/2/2/1 c:v2/2/2/2
file:v2/2/2/2/1
So:
> $ cd /
> $ mv a z
/a/b/c/file -> /z/b/c/file
/:v2 /:v3
a:v2/2 z:v2/2
b:v2/2/1 b:v2/2/1
c:v2/2/2/2 c:v2/2/2/2
file:v2/2/2/2/1 file:v2/2/2/2/1
> $ mkdir -p a/b/c
/ -> /a/ -> /a/b/ -> /a/b/c/
/:v3 /:v4 /:v4 /:v4
a:v4/1 a:v4/2 a:v4/2
b:v4/2/1 b:v4/2/2
c:v4/2/2/1
> $ echo whatever >file
/a/b/c/ -> /a/b/c/file
/:v4 /:v4
a:v4/2 a:v4/2
b:v4/2/2 b:v4/2/1
c:v4/2/2/1 c:v4/2/2/2
file:v4/2/2/2/1
> Then, against server 2:
>
> $ cat /a/b/c/file
OK, we need to start with the original
synchronized version#s here again, so
now on server 2 the version # of
/a/b/c/file is v2/2/2/2/1 while on
server one it is: v4/2/2/2/1.
> Would have to know to heal directory listings all
> the way up to its root directory listing to give the
> correct answer here.
I agree, it would have to know this, but it does,
doesn't it? In order to read (cat) /a/b/c/file,
a lookup is first done on / right? This would
cause / to be healed before it could even lookup
a. This healing would cascade down until we
are ready to read /a/b/c/file. I see now that
indeed directory healing does not have to require
modified file data to be healed, only file
adds/deletes/moves need to be recorded. The
file data can be healed when the file is
accessed. Added files can be added as empty
version 0 files signifying that they need
to heal (perhaps this already happens?)
I admit, this probably assumes that moves
are recorded as moves, and not just add /
deletes which might cause things to fail,
or have the same performance problem that
I point out below in the "global version#"
solution.
> I think the single, global version number I
> mentioned in the "Client side AFR race conditions"
> provides an interesting solution here.
> Consider the following commands and their
> corresponding file system states starting with an
> empty root. In this model, changing the
> content/version number of any child element is
> considered to change the directory listing of the
> parent, and renames update the version number
> of all children of the renamed element:
>
> / v1
>
> $ mkdir /a
> / v2
> /a v2
>
> $ mkdir /b
> / v3
> /a v2
> /b v3
>
> $ echo whatever > /a/1
> / v4
> /a v4
> /a/1 v4
> /b v3
>
> $ echo whatever > /a/2
> / v5
> /a v5
> /a/1 v4
> /a/2 v5
> /b v3
>
> $ mv /a /z
> / v6
> /b v3
> /z v6
> /z/1 v6
> /z/2 v6
This would force an unneeded resync on
/z /z/1 and /z/2 wouldn't it? That could
be very expensive since 1 and 2 could be
large files!
> $ rm /z/2
> / v7
> /b v3
> /a v7
> /a/1 v6
"a"s should be "z"s I assume here.
> This glosses over the locking issues we were
> discussing in the other thread, but in this
> model, a client can quickly determine whether
> its copy of any directory listing or file is
> up to date based on solely that file or
> directory's own version number (locally and
> on the server), and giving a parent directory
> a new version number does not invalidate the
> data of all its children.
This seems like it would mostly work, just that
it seems like directory renames would require the
entire subtree to be resynced needlessly! A
directory rename should normally (on unix) be
a very small operation, this would bring us
back to the old DOS days, where, if I recall
correctly, it meant copying the entire
subtree. ;)
If you think that there are still problems/holes
in the "full parent tree version" solution perhaps
there is another minor tweak to your "global
version #" solution which will make it work more
efficiently on directory renames?
-Martin
____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
More information about the Gluster-devel
mailing list