[Gluster-users] AFR going away?

Anand Babu Periasamy ab at zresearch.com
Mon Jan 5 13:25:58 UTC 2009


Nope, we are just implementing a better approach to healing. BTW,
We "afr" will be renamed to "replicate" (and still alias as
AFR for backward compatibility).
--
Anand Babu

Keith Freedman wrote:
> At 02:30 AM 1/5/2009, Anand Babu Periasamy wrote:
>> Christopher, main issue with self-heal is its complexity. Handling 
>> self-healing
>> logic in a non-blocking asynchronous code path is difficult. 
>> Replicating a missing
>> sounds simple, but holding off a lookup call and initiating a new 
>> series of calls
>> to heal the file and then resuming back normal operation is tricky. 
>> Much of the
>> bugs we faced in 1.3 is related to self-heal. We have handled most of 
>> these cases
>> over a period of time. Self-healing is decent now, but not good 
>> enough. We feel that
>> it has only complicated the code base. It is hard to test and maintain 
>> this part of
>> the code base.
>>
>> Plan is to drop self-heal code all together once the active healing 
>> tool gets ready.
>> Unlike self-healing, this active healing can be run by the user on a 
>> mounted file system
>> (online) any time. By moving the code out of the file system, into a 
>> tool (that is
>> synchronous and linear), we can implement sophisticated healing 
>> techniques.
>>
>> Code is not in the repository yet. Hopefully in a month, it will be 
>> ready for use.
>> You can simply turn off self-heal and run this utility while the file 
>> system is mounted.
> 
> I realize this is perhaps a bit premature, but am I to understand you'll 
> be doing away with auto self-healing in replicate?
> this seems to eliminate much of the value of glusters AFR component.
> if we have to manually heal with some tool, there's always a risk of a 
> data integrity problem while this healing process is being excuted after 
> a server interruption.
> 
> if it's going to be optional to turn on/off, that's fine, I suppose, but 
> please, if you're considering removing this feature altogether, 
> reconsider.  Unless this active healing tol is something that would be 
> run automatically anytime there's a disconnect between AFR servers.
> 
> While I certainly do realize that the self-heal code is a HUGE 
> performance issue as it's currently written (at least that's what I'm 
> noticing on my servers), it's function is necessary to make the AFR 
> useful.        
> 





More information about the Gluster-users mailing list