[Gluster-users] AFR going away?

Keith Freedman freedman at FreeFormIT.com
Mon Jan 5 12:55:20 UTC 2009


At 02:30 AM 1/5/2009, Anand Babu Periasamy wrote:
>Christopher, main issue with self-heal is its complexity. Handling 
>self-healing
>logic in a non-blocking asynchronous code path is difficult. 
>Replicating a missing
>sounds simple, but holding off a lookup call and initiating a new 
>series of calls
>to heal the file and then resuming back normal operation is tricky. 
>Much of the
>bugs we faced in 1.3 is related to self-heal. We have handled most 
>of these cases
>over a period of time. Self-healing is decent now, but not good 
>enough. We feel that
>it has only complicated the code base. It is hard to test and 
>maintain this part of
>the code base.
>
>Plan is to drop self-heal code all together once the active healing 
>tool gets ready.
>Unlike self-healing, this active healing can be run by the user on a 
>mounted file system
>(online) any time. By moving the code out of the file system, into a 
>tool (that is
>synchronous and linear), we can implement sophisticated healing techniques.
>
>Code is not in the repository yet. Hopefully in a month, it will be 
>ready for use.
>You can simply turn off self-heal and run this utility while the 
>file system is mounted.

I realize this is perhaps a bit premature, but am I to understand 
you'll be doing away with auto self-healing in replicate?
this seems to eliminate much of the value of glusters AFR component.
if we have to manually heal with some tool, there's always a risk of 
a data integrity problem while this healing process is being excuted 
after a server interruption.

if it's going to be optional to turn on/off, that's fine, I suppose, 
but please, if you're considering removing this feature altogether, 
reconsider.  Unless this active healing tol is something that would 
be run automatically anytime there's a disconnect between AFR servers.

While I certainly do realize that the self-heal code is a HUGE 
performance issue as it's currently written (at least that's what I'm 
noticing on my servers), it's function is necessary to make the AFR 
useful.         





More information about the Gluster-users mailing list