[Gluster-devel] Client side AFR race conditions?

Anand Babu Periasamy ab at gnu.org.in
Mon May 5 07:42:59 UTC 2008


Hi Martin, I will respond to this email later today after reading
the entire thread.

I really want to understand the issue and help you out. We always
have heated discussions even in our labs. We only take it
positively :) Your feedback is very valuable to us.

Thanks and Regards,
--
Anand Babu Periasamy
GPG Key ID: 0x62E15A31
Blog [http://ab.freeshell.org]
The GNU Operating System [http://www.gnu.org]
Z RESEARCH Inc [http://www.zresearch.com]



Martin Fick wrote:
> --- Anand Babu Periasamy <ab at gnu.org.in> wrote:
> 
>> If application doesn't use locking in a multi-user
>> mode, data can be corrupted with or without AFR.
>> With AFR in place, corruption can also result in 
>> disparate set of data, other than losing the order
>> of writes. No file system can guarantee integrity,
>> if applications do not synchronize writes in
>> multiuser mode.
> 
> No other (non-buggy) posix filesystem would ever
> return two different results for the same read without
> a write in between (and then potentially do the same
> again without a write!).  It simply violates posix
> (and most other filesystem) semantics.  This is not a
> case of corruption.  I do not want to belabor the
> point, but I am not sure that you are talking about
> the same situation as I am, I will repost the details.
>  Please don't take this the wrong way, but sometimes
> details are overlooked in these long threads.
> 
>> In other words, what prevents conflicts when 
>> client A & B both write to the same file?  Could 
>> A's write to subvolume A succeed before B's write
>> to subvolume A, and at the same time B's write to 
>> subvolume B succeed before A's write to subvolume
>> B? 
> 
> The answer I got was a 'yes' this means that now on
> subvolume A version 73 of a file may be completely
> different than version 73 of the same file on
> subvolume B without either of the nodes having failed.
>  In fact, I imagine this is possible while running AFR
> on a single node with both subvolumes on the same node
> as AFR if the glusterfsd daemon is running multiple
> threads!  I imagine this is unlikely, but it might in
> fact be more likely since a thread could block right
> after writing to the first subvolume giving the second
> thread plenty of room to start a new write to both
> subvolumes.
> 
> I think that many (but probably not enough) people
> using AFR understand that split brain situations are
> possible when node subvolumes go down.  However, I
> imagine that most people using AFR think that if they
> have fancy resillient hardware with high uptimes and
> reliable, possibly even multi-path networking devices
> in use with glusterfs that they are not going to
> experience a split brain situation unless a node and
> or router/switch goes down.  What I am describing is
> exactly that, split brain under ordinary non hardware
> failure conditions, certainly not posix behavior, not
> something that could happen with every other
> filesystem as you claim.
> 
>> Even if we introduce atomic writes within AFR, 
> 
> Again, atomicity is not the issue.
> 
>> it still doesn't fix application's bugs. It will 
>> only slow down writes for well behaved
>> applications.
> 
> I understand that any solution for this is likely to
> hurt performance, although I suggested a solution that
> I believe might actually not.  I am curious if you
> think my "quick-heal" approach would hurt performance?
>  And, of course, sacrificing certain behaviors for
> performance is a common tradeoff that many are willing
> to, and should be able to make, but who would
> sacrifice reliability if it can be done without
> hurting performance?
> 
> While I personally hope for a solution to this, I
> certainly don't "expect" one, but I really think that
> it is important that people are informed about and
> understand this potential problem.
> 
> Cheers,
> 
> -Martin
> 
> 
> 
>       ____________________________________________________________________________________
> Be a better friend, newshound, and 
> know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ






More information about the Gluster-devel mailing list