[Gluster-devel] Client side AFR race conditions?

Anand Babu Periasamy ab at gnu.org.in
Sat May 3 00:04:45 UTC 2008


Let me explain more about this issue:

When multiple applications write to a same file, it is really
application's responsibility to handle coherency using
POSIX locks or some form of IPC/RPC mechanism.  Even without
AFR, file system's do not guarantee order of writes and hence
integrity of data. When AFR is inserted, this corruption may
lead to disparate set of data other than overwrites.

It shouldn't be seen as an issue with AFR. If applications
handle coherency, AFR will work fine. It is possible to
introduce atomic-write option (locked writes) in AFR, but
it is useless, because it still cannot avoid corruption
because one application overwrote the data of the other,
without holding a lock.

In summary, AFR doesn't have race condition.
--
Anand Babu Periasamy
GPG Key ID: 0x62E15A31
Blog [http://ab.freeshell.org]
The GNU Operating System [http://www.gnu.org]
Z RESEARCH Inc [http://www.zresearch.com]



Martin Fick wrote:
> --- Krishna Srinivas <krishna at zresearch.com> wrote:
>>> I am curious, is client side AFR susceptible
>>> to race conditions on writes?  If not, how is this
>>> mitigated?
> 
>> This is a known issue with the client side AFR. 
> 
> Ah, OK.  Perhaps it is already documented somewhere,
> but I can't help but think that perhaps the AFR
> translator deserves a page dedicated to some of the
> design trade offs made and the impact the they have. 
> With enough thought, it is possible to deduce/guess at
> some of the potential problems such as split brain and
> race conditions, but for most of us this is still a
> guess until we ask on the list.  Perhaps with the help
> of others I will setup a wiki page for this.  This
> kind of documented info would probably help situations
> like the one with Garreth where he felt mislead by the
> glusterfs documentation.
> 
>> We can solve this by locking but there will be 
>> performance hit. Of course if applications lock 
>> themselves then all will be fine. I feel we can have
> 
>> it as an option to disable the locking
>> in case users are more concerned about performance.
>>
>> Do you have any suggestions?
> 
> I haven't given it a lot of thought, but, how would
> the locking work?  Would you be doing:
> 
>   SubA          AFR      application     SubB
>     |            |            |            |
>     |            |<---write---|            |
>     |            |            |            |
>     |<---lock----|-----------lock--------->|
>     |---locked-->|<---------locked---------|
>     |            |            |            |
>     |<--write----|----------write--------->|
>     |--written-->|<--------written---------|
>     |            |            |            |
>     |<--unlock---|----------unlock-------->|
>     |--unlocked->|<--------unlocked--------|
>     |            |            |            |
>     |            |---written->|            |
> 
> 
> because that does seem to be a rather large 3
> roundtrip latency versus the current single rountrip,
> not including all the lock contention performance
> hits!  This solution also has the problem of lock
> recovery if a client dies.
> 
> If instead, a rank (which could be configurable or
> random) were given to each subvolume on startup, one
> alternative would be to always write to the highest
> ranking subvolume first:
> 
>    (A is a higher rank than B)
> 
>   SubA         AFR         Application        SubB
>     |           |               |               |
>     |           |<----write-----|               |
>     |<--write---|               |               |
>     |--version->|               |               |
>     |           |----written--->|               |
>     |           |               |               |
>     |           |----------(quick)heal--------->|
>     |           |<------------healed------------|
> 
> The quick heal would essentially be the write but
> knowing/enforcing the version # returned from the SubA
> write.  Since all clients would always have to write
> to SubA first, then SubA's ordering would be reflected
> on every subvolume. While this solution leaves a
> potentially larger time when SubB is unsynced, this
> should maintain the single roundtrip latency from an
> application's standpoint and avoid any lock contention
> performance hits?  If a client dies in this scenario,
> any other client could always heal SubB from SubA, no
> lock recovery problems.
> 
> 
> Both of these solutions could probably be greatly
> enhanced with a write ahead log translator or some
> form of buffering above each subvolume, this would
> decrease the latency by allowing the write data to be
> transferred before/while the lock/ordering info is
> synchronized.  But this may be rather complicated? 
> However, as is, they both seem like fairly simple
> solutions without too much of a design change?
> 
> 
> The non locking approach seems a little odd at first
> and may be more of a change to the current AFR method
> conceptually, but the more I think about it, the more
> it seems appealing.  Perhaps it would not actually
> even be a big coding change?  I can't help but think
> that this method could also potentially be useful to
> eliminate more splitbrain situations, but I haven't
> worked that out yet.  
> 
> There is a somewhat suttle reason, but it makes sense
> that the locking solution is slower since locking
> enforces serialization across all the writes.  This
> serialization is not really what is needed; we only
> need to ensure that the potentially unserialized
> ordering is the same on both subvolumes.
> 
> Thoughts?
> 
> -Martin
> 
> 
> P.S. Simple ascii diagrams generated with:
> http://www.theficks.name/test/Content/pmwiki.php?n=Sdml.HomePage
> 
> 
> 
>       ____________________________________________________________________________________
> Be a better friend, newshound, and 
> know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel





More information about the Gluster-devel mailing list