[Gluster-devel] Client side AFR race conditions?

Sat May 3 04:05:40 UTC 2008

--- Anand Babu Periasamy <ab at gnu.org.in> wrote:

> If application doesn't use locking in a multi-user
> mode, data can be corrupted with or without AFR.
> With AFR in place, corruption can also result in 
> disparate set of data, other than losing the order
> of writes. No file system can guarantee integrity,
> if applications do not synchronize writes in
> multiuser mode.

No other (non-buggy) posix filesystem would ever
return two different results for the same read without
a write in between (and then potentially do the same
again without a write!).  It simply violates posix
(and most other filesystem) semantics.  This is not a
case of corruption.  I do not want to belabor the
point, but I am not sure that you are talking about
the same situation as I am, I will repost the details.
 Please don't take this the wrong way, but sometimes
details are overlooked in these long threads.

> In other words, what prevents conflicts when 
> client A & B both write to the same file?  Could 
> A's write to subvolume A succeed before B's write
> to subvolume A, and at the same time B's write to 
> subvolume B succeed before A's write to subvolume
> B? 

The answer I got was a 'yes' this means that now on
subvolume A version 73 of a file may be completely
different than version 73 of the same file on
subvolume B without either of the nodes having failed.
 In fact, I imagine this is possible while running AFR
on a single node with both subvolumes on the same node
as AFR if the glusterfsd daemon is running multiple
threads!  I imagine this is unlikely, but it might in
fact be more likely since a thread could block right
after writing to the first subvolume giving the second
thread plenty of room to start a new write to both
subvolumes.

I think that many (but probably not enough) people
using AFR understand that split brain situations are
possible when node subvolumes go down.  However, I
imagine that most people using AFR think that if they
have fancy resillient hardware with high uptimes and
reliable, possibly even multi-path networking devices
in use with glusterfs that they are not going to
experience a split brain situation unless a node and
or router/switch goes down.  What I am describing is
exactly that, split brain under ordinary non hardware
failure conditions, certainly not posix behavior, not
something that could happen with every other
filesystem as you claim.

> Even if we introduce atomic writes within AFR, 

Again, atomicity is not the issue.

> it still doesn't fix application's bugs. It will 
> only slow down writes for well behaved
> applications.

I understand that any solution for this is likely to
hurt performance, although I suggested a solution that
I believe might actually not.  I am curious if you
think my "quick-heal" approach would hurt performance?
 And, of course, sacrificing certain behaviors for
performance is a common tradeoff that many are willing
to, and should be able to make, but who would
sacrifice reliability if it can be done without
hurting performance?

While I personally hope for a solution to this, I
certainly don't "expect" one, but I really think that
it is important that people are informed about and
understand this potential problem.

Cheers,

-Martin

      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ