[Gluster-devel] Split-brain present and future in afr

Jeff Darcy jdarcy at redhat.com
Fri May 23 12:17:29 UTC 2014


> > Constantly filtering requests to use either N or N+1 bricks is going to be
> > complicated and hard to debug.  Every data-structure allocation or loop
> > based on replica count will have to be examined, and many will have to be
> > modified.  That's a *lot* of places.  This also overlaps significantly
> > with functionality that can be achieved with data classification (i.e.
> > supporting multiple replica levels within the same volume).  What use case
> > requires that it be implemented within AFR instead of more generally and
> > flexibly?
> 
> 1) It wouldn't still bring in arbiter for replica 2.

It's functionally the same, just implemented in a more modular fashion.
Either way, for the same set of data that was previously replicated
twice, most data would still be replicated twice but some subset would
be replicated three times.  The "policy filter" is just implemented in a
translator dedicated to the purpose, instead of within AFR.  In addition
to being simpler, this keeps the user experience consistent for setting
this vs. other kinds of policies.

> 2) That would need more bricks, more processes, more ports.

Fewer, actually.  Either approach requires that we split bricks (as the
user sees them).  One way we turn N user bricks into N regular bricks
plus N/2 arbiter bricks.  The other way we turn N user bricks into N
bricks for the replica-2 part and another N for the replica-3 part.
That seems like slightly more, but (a) it's the same user view, and (b)
for processes and ports it will actually be less.  Since data
classification is likely to involve splitting bricks many times, and
multi-tenancy likewise, the data classification project is already
scoped to include "multiplexing" multiple bricks into one process on one
port (like HekaFS used to do).  Thus the total number of ports and
processes for an N-brick volume will go back down to N even with the
equivalent of arbiter functionality.

Doing "replica 2.5" as part of data classification instead of within AFR
also has other advantages.  For example, it naturally gives us support
for overlapping replica sets - an often requested feature to spread load
more evenly after a failure.  Perhaps most importantly, it doesn't
require separate implementations or debugging for AFRv1, AFRv2, and NSR.

Let's for once put our effort where it will do us most good, instead of
succumbing to "streetlight effect"[1] yet again and hacking on the
components that are most familiar.

[1] http://en.wikipedia.org/wiki/Streetlight_effect


More information about the Gluster-devel mailing list