[Gluster-devel] Re: [List-hacking] [bug #25207] an rm of a file should not cause that file to be replicated with afr self-heal.
Gordan Bobic
gordan at bobich.net
Mon Jan 5 17:55:42 UTC 2009
Maybe I'm missing something here, but if you take self-healing out of
AFR, then surely that makes the system completely useless and no better
than running rsync every 5 minutes. Since that can't be right, what am I
missing?
Gordan
Anand Babu Periasamy wrote:
> Christopher, main issue with self-heal is its complexity. Handling
> self-healing
> logic in a non-blocking asynchronous code path is difficult. Replicating
> a missing
> sounds simple, but holding off a lookup call and initiating a new series
> of calls
> to heal the file and then resuming back normal operation is tricky. Much
> of the
> bugs we faced in 1.3 is related to self-heal. We have handled most of
> these cases
> over a period of time. Self-healing is decent now, but not good enough.
> We feel that
> it has only complicated the code base. It is hard to test and maintain
> this part of
> the code base.
>
> Plan is to drop self-heal code all together once the active healing tool
> gets ready.
> Unlike self-healing, this active healing can be run by the user on a
> mounted file system
> (online) any time. By moving the code out of the file system, into a
> tool (that is
> synchronous and linear), we can implement sophisticated healing techniques.
>
> Code is not in the repository yet. Hopefully in a month, it will be
> ready for use.
> You can simply turn off self-heal and run this utility while the file
> system is mounted.
>
> List-hacking is an internal list, mostly junk :). It is an internal
> company list.
> We don't discuss technical / architectural stuff there. They are mostly
> done over
> phone and in-person meetings. We do want to actively involve the
> community right
> from the design phase. Mailing list is cumbersome and slow to interactively
> brainstorm design discussions. We can once in a while organize IRC sessions
> for this purpose.
>
> --
> Anand Babu
>
> Swank iest wrote:
>> Well,
>>
>> I guess this is getting outside of the bug. I suppose you are going
>> to mark it as not going to fix?
>>
>> I'm trying to put gluster into production right now, so may I ask:
>>
>> 1) What are the current issues with self-heal that require a full
>> re-write? Is there a place in the Wiki or elsewhere where it's being
>> documented?
>> 2) May I see the new code? I must not be looking in the correct place
>> in TLA?
>> 3) If it's not written yet, may I be included in the design
>> discussion? (As I haven't put gluster into production yet, now would
>> be a good time to know if it's not going to work in the near future.)
>> 4) May I be placed on the list-hacking at zresearch.com mailing list,
>> please?
>>
>> Christopher.
>>
>> > Date: Mon, 5 Jan 2009 01:36:14 -0800
>> > From: ab at zresearch.com
>> > To: krishna at zresearch.com
>> > CC: swankier at msn.com; list-hacking at zresearch.com
>> > Subject: Re: [List-hacking] [bug #25207] an rm of a file should not
>> cause that file to be replicated with afr self-heal.
>> >
>> > Krishna, leave it as is. Once self-heal ensures that the volumes
>> are intact, rm will
>> > remove both the copies anyways. It is inefficient, but optimizing
>> it the current framework
>> > will be hacky.
>> >
>> > Swaniker, We are ditching the current self-healing framework with
>> an active healing tool.
>> > We can take care of it then.
>> >
>> >
>> > Krishna Srinivas wrote:
>> >> The current selfheal logic is built in lookup of a file, lookup is
>> >> issued just before any file operation on a file. So if the lookup
>> call
>> >> does not know whether an open or rm is going to be done on the file.
>> >> Will get back to you if we can do anything about this, i.e to save
>> the
>> >> redundant copy of the file when it is going to be rm'ed
>> >>
>> >> Krishna
>> >>
>> >> On Mon, Jan 5, 2009 at 12:19 PM, swankier
>> <INVALID.NOREPLY at gnu.org> wrote:
>> >>> Follow-up Comment #2, bug #25207 (project gluster):
>> >>>
>> >>> I am:
>> >>>
>> >>> 1) delete file from posix system beneath afr on one side
>> >>> 2) run rm on gluster file system
>> >>>
>> >>> file is then replicated followed by deletion
More information about the Gluster-devel
mailing list