[Gluster-devel] Client side AFR race conditions?
Martin Fick
mogulguy at yahoo.com
Fri May 2 18:17:24 UTC 2008
--- Krishna Srinivas <krishna at zresearch.com> wrote:
> > I am curious, is client side AFR susceptible
> > to race conditions on writes? If not, how is this
> > mitigated?
> This is a known issue with the client side AFR.
Ah, OK. Perhaps it is already documented somewhere,
but I can't help but think that perhaps the AFR
translator deserves a page dedicated to some of the
design trade offs made and the impact the they have.
With enough thought, it is possible to deduce/guess at
some of the potential problems such as split brain and
race conditions, but for most of us this is still a
guess until we ask on the list. Perhaps with the help
of others I will setup a wiki page for this. This
kind of documented info would probably help situations
like the one with Garreth where he felt mislead by the
glusterfs documentation.
> We can solve this by locking but there will be
> performance hit. Of course if applications lock
> themselves then all will be fine. I feel we can have
> it as an option to disable the locking
> in case users are more concerned about performance.
>
> Do you have any suggestions?
I haven't given it a lot of thought, but, how would
the locking work? Would you be doing:
SubA AFR application SubB
| | | |
| |<---write---| |
| | | |
|<---lock----|-----------lock--------->|
|---locked-->|<---------locked---------|
| | | |
|<--write----|----------write--------->|
|--written-->|<--------written---------|
| | | |
|<--unlock---|----------unlock-------->|
|--unlocked->|<--------unlocked--------|
| | | |
| |---written->| |
because that does seem to be a rather large 3
roundtrip latency versus the current single rountrip,
not including all the lock contention performance
hits! This solution also has the problem of lock
recovery if a client dies.
If instead, a rank (which could be configurable or
random) were given to each subvolume on startup, one
alternative would be to always write to the highest
ranking subvolume first:
(A is a higher rank than B)
SubA AFR Application SubB
| | | |
| |<----write-----| |
|<--write---| | |
|--version->| | |
| |----written--->| |
| | | |
| |----------(quick)heal--------->|
| |<------------healed------------|
The quick heal would essentially be the write but
knowing/enforcing the version # returned from the SubA
write. Since all clients would always have to write
to SubA first, then SubA's ordering would be reflected
on every subvolume. While this solution leaves a
potentially larger time when SubB is unsynced, this
should maintain the single roundtrip latency from an
application's standpoint and avoid any lock contention
performance hits? If a client dies in this scenario,
any other client could always heal SubB from SubA, no
lock recovery problems.
Both of these solutions could probably be greatly
enhanced with a write ahead log translator or some
form of buffering above each subvolume, this would
decrease the latency by allowing the write data to be
transferred before/while the lock/ordering info is
synchronized. But this may be rather complicated?
However, as is, they both seem like fairly simple
solutions without too much of a design change?
The non locking approach seems a little odd at first
and may be more of a change to the current AFR method
conceptually, but the more I think about it, the more
it seems appealing. Perhaps it would not actually
even be a big coding change? I can't help but think
that this method could also potentially be useful to
eliminate more splitbrain situations, but I haven't
worked that out yet.
There is a somewhat suttle reason, but it makes sense
that the locking solution is slower since locking
enforces serialization across all the writes. This
serialization is not really what is needed; we only
need to ensure that the potentially unserialized
ordering is the same on both subvolumes.
Thoughts?
-Martin
P.S. Simple ascii diagrams generated with:
http://www.theficks.name/test/Content/pmwiki.php?n=Sdml.HomePage
____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
More information about the Gluster-devel
mailing list