[Gluster-users] glusterfs performance issues

Jeff Darcy jdarcy at redhat.com
Tue Jan 8 14:02:13 UTC 2013


On 1/8/13 8:07 AM, Stephan von Krawczynski wrote:

> This can only happen in a broken versioning. Obviously one would take (very
> rough explanation) at least a two-shot concept. You increase the version by
> one when starting the file modification process and again by one when the
> process is completed without error.
> You end up knowing that version nr 1,3,5,... are intermediate/incomplete
> versions and 2,4,6,... are files with completed operations.
> Now you can tell at any time throughout any stat comparison which file is
> truely actual and which one is in intermediate state. If you want that you can
> even await the completion of an ongoing modification before returning some
> result to your requesting app. Yes, this would result in immanent locking.

In other (fewer) words, you need a bit to represent the presence of incomplete
operations.  Thus, odd(version) is equivalent to nonzero(changelog) and the
higher-order bits are almost irrelevant.  Consider the case where replica A has
reached version 3.  It can not proceed to 4 until the write that got it to 3
has been propagated to other replicas.  It can not proceed to 5 because that
might cause data loss.  Replica B might also be at 3 because of a different
write, and in that case allowing A's write to supersede B's just because 5>3
would be wrong.  Greater-than comparisons don't distinguish properly between
incomplete operations and split brain.  For that you need to compare those
low-order bits:

* Neither set: both copies "idle" and up to date.

* Only one set: need to propagate update(s) from odd to even.

* Both set: split brain.

The reason I say that the higher-order bits are *almost* irrelevant is that
they can provide additional information after split-brain that can aid manual
conflict resolution (specifically: degrees of separate progress).  This also
has its equivalent with the change log, which uses counters instead of single
bits to represent number of operations.  In that case we count operations only
since the last common point instead of since creation, which is less
susceptible to overflow (another kind of complexity that theorists can ignore
but actual implementers have to consider).

Your idea as stated was still broken.  With its major flaws fixed (you're
welcome) it's almost exactly equivalent to the existing change-log
implementation.  If condescension is warranted here, it's surely not the
condescension you expressed toward a mechanism you hadn't even tried to understand.



More information about the Gluster-users mailing list