[Gluster-users] AFR going away?
Keith Freedman
freedman at FreeFormIT.com
Mon Jan 5 12:55:20 UTC 2009
At 02:30 AM 1/5/2009, Anand Babu Periasamy wrote:
>Christopher, main issue with self-heal is its complexity. Handling
>self-healing
>logic in a non-blocking asynchronous code path is difficult.
>Replicating a missing
>sounds simple, but holding off a lookup call and initiating a new
>series of calls
>to heal the file and then resuming back normal operation is tricky.
>Much of the
>bugs we faced in 1.3 is related to self-heal. We have handled most
>of these cases
>over a period of time. Self-healing is decent now, but not good
>enough. We feel that
>it has only complicated the code base. It is hard to test and
>maintain this part of
>the code base.
>
>Plan is to drop self-heal code all together once the active healing
>tool gets ready.
>Unlike self-healing, this active healing can be run by the user on a
>mounted file system
>(online) any time. By moving the code out of the file system, into a
>tool (that is
>synchronous and linear), we can implement sophisticated healing techniques.
>
>Code is not in the repository yet. Hopefully in a month, it will be
>ready for use.
>You can simply turn off self-heal and run this utility while the
>file system is mounted.
I realize this is perhaps a bit premature, but am I to understand
you'll be doing away with auto self-healing in replicate?
this seems to eliminate much of the value of glusters AFR component.
if we have to manually heal with some tool, there's always a risk of
a data integrity problem while this healing process is being excuted
after a server interruption.
if it's going to be optional to turn on/off, that's fine, I suppose,
but please, if you're considering removing this feature altogether,
reconsider. Unless this active healing tol is something that would
be run automatically anytime there's a disconnect between AFR servers.
While I certainly do realize that the self-heal code is a HUGE
performance issue as it's currently written (at least that's what I'm
noticing on my servers), it's function is necessary to make the AFR
useful.
More information about the Gluster-users
mailing list