[Gluster-users] AFR going away?
Anand Babu Periasamy
ab at zresearch.com
Mon Jan 5 13:25:58 UTC 2009
Nope, we are just implementing a better approach to healing. BTW,
We "afr" will be renamed to "replicate" (and still alias as
AFR for backward compatibility).
--
Anand Babu
Keith Freedman wrote:
> At 02:30 AM 1/5/2009, Anand Babu Periasamy wrote:
>> Christopher, main issue with self-heal is its complexity. Handling
>> self-healing
>> logic in a non-blocking asynchronous code path is difficult.
>> Replicating a missing
>> sounds simple, but holding off a lookup call and initiating a new
>> series of calls
>> to heal the file and then resuming back normal operation is tricky.
>> Much of the
>> bugs we faced in 1.3 is related to self-heal. We have handled most of
>> these cases
>> over a period of time. Self-healing is decent now, but not good
>> enough. We feel that
>> it has only complicated the code base. It is hard to test and maintain
>> this part of
>> the code base.
>>
>> Plan is to drop self-heal code all together once the active healing
>> tool gets ready.
>> Unlike self-healing, this active healing can be run by the user on a
>> mounted file system
>> (online) any time. By moving the code out of the file system, into a
>> tool (that is
>> synchronous and linear), we can implement sophisticated healing
>> techniques.
>>
>> Code is not in the repository yet. Hopefully in a month, it will be
>> ready for use.
>> You can simply turn off self-heal and run this utility while the file
>> system is mounted.
>
> I realize this is perhaps a bit premature, but am I to understand you'll
> be doing away with auto self-healing in replicate?
> this seems to eliminate much of the value of glusters AFR component.
> if we have to manually heal with some tool, there's always a risk of a
> data integrity problem while this healing process is being excuted after
> a server interruption.
>
> if it's going to be optional to turn on/off, that's fine, I suppose, but
> please, if you're considering removing this feature altogether,
> reconsider. Unless this active healing tol is something that would be
> run automatically anytime there's a disconnect between AFR servers.
>
> While I certainly do realize that the self-heal code is a HUGE
> performance issue as it's currently written (at least that's what I'm
> noticing on my servers), it's function is necessary to make the AFR
> useful.
>
More information about the Gluster-users
mailing list