[Gluster-devel] DHT: Making Dir rename op's crash consistent
Anand Avati
anand.avati at gmail.com
Wed Jan 23 08:13:52 UTC 2013
On Wed, Jan 23, 2013 at 12:12 AM, Anand Avati <anand.avati at gmail.com> wrote:
>
>
> On Sun, Oct 28, 2012 at 11:30 PM, Shishir Gowda <sgowda at redhat.com> wrote:
>
>> Hi All,
>>
>> This is a proposed enhancement to DHT directory rename operations to make
>> it recoverable in-case of crashes.
>>
>> Please feel free to review/comment on the design. There also 2 open
>> issues which need to tackled (see below in recovery logic)
>>
>> We propose to add 2 new on-disk xattrs SRC(<key to be
>> decided>:destination path) and DST(<key to be decided>:src path/gfid).
>>
>> Consider these scenarios
>>
>> case1. Only source directory exists
>> case2. Both source, and destination directories exist.
>>
>> The tasks for rename would be as follows:
>>
>> 1. Set SRC key on all source directories
>> 2. If step 1 fails, remove xattrs, and fail rename
>> 3. If case2, set xattrs on destination directories
>> 4. If failure in case2, ignore
>> 5. Rename directories (opendir on dst, readdir(ENOEMPTY error), rename
>> dst_hashed subvol first, and then rest)
>> 6. If step 5 fails with any error other than ENOTCONN, fail rename, and
>> remove xattrs
>> 7. If failure is because of ENOTCONN, proceed with rename and return a
>> success.
>>
>> Recovery steps (once the brick comes up):
>>
>> 1. On lookup/readdir (NFS requirement?) query for these SRC and DST key.
>>
>
> When was DST key set? None of the steps above (1-7) seem to set it?
>
>
Sorry! Misread step 3. Does that mean, if it is case 2, SRC should not be
set?
Avati
> Avati
>
>
>> 2. If SRC key is found , validate:
>> a. If mtime is less than 5 seconds of the lookup request, then do not
>> heal, as rename might be in progress (Can we make this more fool proof?)
>> b. If dst does, not exist, proceed
>> c. If dst exists, check its key and see if they match. If mismatch, do
>> not rename, as it might lead to gfid mis-match.
>>
>> 3. Proceed with checks rename of directories (similar to step 5 of above
>> (rename).
>> 4. If successful, remove xattrs, return success.
>> 5. If failure what needs to be done? (other rename's might have
>> succeeded, this might fail due to ENOTEMPTY(even due to race)
>>
>>
>> As for subvol down, we can't guarantee in the scenarios of brick going
>> down after stage 1(setxattr).
>>
>> Brick going down before start of subvolume: We do not allow rename to
>> progress anywhere.
>>
>> If a brick goes down after setxattr, if it has files, or files are
>> created after its up (possible race), then we cant recover.
>>
>>
>> With regards,
>> Shishir
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130123/e0424123/attachment-0001.html>
More information about the Gluster-devel
mailing list