[Gluster-devel] DHT: Making Dir rename op's crash consistent

John Mark Walker johnmark at redhat.com
Mon Oct 29 15:36:57 UTC 2012


Thanks, Shishir. Please make sure to also include your proposed changes on the wiki, either here:

http://www.gluster.org/community/documentation/index.php/Features

or here:

http://www.gluster.org/community/documentation/index.php/Planning34#Contributed_Feature_Ideas


If we're pretty confident that this will go into the 3.4 planning cycle, then I prefer the latter.


-JM



----- Original Message -----
> Hi All,
> 
> This is a proposed enhancement to DHT directory rename operations to
> make it recoverable in-case of crashes.
> 
> Please feel free to review/comment on the design. There also 2 open
> issues which need to tackled (see below in recovery logic)
> 
> We propose to add 2 new on-disk xattrs SRC(<key to be
> decided>:destination path) and DST(<key to be decided>:src
> path/gfid).
> 
> Consider these scenarios
> 
> case1. Only source directory exists
> case2. Both source, and destination directories exist.
> 
> The tasks for rename would be as follows:
> 
> 1. Set SRC key on all source directories
> 2. If step 1 fails, remove xattrs, and fail rename
> 3. If case2, set xattrs on destination directories
> 4. If failure in case2, ignore
> 5. Rename directories (opendir on dst, readdir(ENOEMPTY error),
> rename dst_hashed subvol first, and then rest)
> 6. If step 5 fails with any error other than ENOTCONN, fail rename,
> and remove xattrs
> 7. If failure is because of ENOTCONN, proceed with rename and return
> a success.
> 
> Recovery steps (once the brick comes up):
> 
> 1. On lookup/readdir (NFS requirement?) query for these SRC and DST
> key.
> 2. If SRC key is found , validate:
>    a. If mtime is less than 5 seconds of the lookup request, then do
>    not heal, as rename might be in progress (Can we make this more
>    fool proof?)
>    b. If dst does, not exist, proceed
>    c. If dst exists, check its key and see if they match. If
>    mismatch, do not rename, as it might lead to gfid mis-match.
> 
> 3. Proceed with checks rename of directories (similar to step 5 of
> above (rename).
> 4. If successful, remove xattrs, return success.
> 5. If failure what needs to be done? (other rename's might have
> succeeded, this might fail due to ENOTEMPTY(even due to race)
> 
> 
> As for subvol down, we can't guarantee in the scenarios of brick
> going down after stage 1(setxattr).
> 
> Brick going down before start of subvolume: We do not allow rename to
> progress anywhere.
> 
> If a brick goes down after setxattr, if it has files, or files are
> created after its up (possible race), then we cant recover.
> 
> 
> With regards,
> Shishir
> 
> 
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
> 




More information about the Gluster-devel mailing list