[Gluster-devel] DHT: Making Dir rename op's crash consistent

Shishir Gowda sgowda at redhat.com
Thu Jan 24 06:38:17 UTC 2013


Waiting for further comments. 


Will resend the design after review is done. 


With regards, 
Shishir 

----- Original Message -----

From: "Anand Avati" <anand.avati at gmail.com> 
To: "Shishir Gowda" <sgowda at redhat.com> 
Cc: gluster-devel at nongnu.org 
Sent: Wednesday, January 23, 2013 1:51:23 PM 
Subject: Re: [Gluster-devel] DHT: Making Dir rename op's crash consistent 




On Sun, Oct 28, 2012 at 11:30 PM, Shishir Gowda < sgowda at redhat.com > wrote: 


Hi All, 

This is a proposed enhancement to DHT directory rename operations to make it recoverable in-case of crashes. 

Please feel free to review/comment on the design. There also 2 open issues which need to tackled (see below in recovery logic) 

We propose to add 2 new on-disk xattrs SRC(<key to be decided>:destination path) and DST(<key to be decided>:src path/gfid). 

Consider these scenarios 

case1. Only source directory exists 
case2. Both source, and destination directories exist. 

The tasks for rename would be as follows: 

1. Set SRC key on all source directories 
2. If step 1 fails, remove xattrs, and fail rename 
3. If case2, set xattrs on destination directories 
4. If failure in case2, ignore 
5. Rename directories (opendir on dst, readdir(ENOEMPTY error), rename dst_hashed subvol first, and then rest) 





We probably need to guarantee empty dst dirs on _all_ dst servers before progressing to step 6?(else, fail ENOTEMPTY)? 
I agree, can add this additional stage 

<blockquote>
6. If step 5 fails with any error other than ENOTCONN, fail rename, and remove xattrs 
7. If failure is because of ENOTCONN, proceed with rename and return a success. 

Recovery steps (once the brick comes up): 

1. On lookup/readdir (NFS requirement?) query for these SRC and DST key. 
2. If SRC key is found , validate: 
a. If mtime is less than 5 seconds of the lookup request, then do not heal, as rename might be in progress (Can we make this more fool proof?) 
b. If dst does, not exist, proceed 

</blockquote>



This is a bit confusing. Wasn't DST xattr set only on the destination directory in step 3? 
dst == directory DST == xattr-key 

<blockquote>
c. If dst exists, check its key and see if they match. If mismatch, do not rename, as it might lead to gfid mis-match. 
</blockquote>

<blockquote>

3. Proceed with checks rename of directories (similar to step 5 of above (rename). 
4. If successful, remove xattrs, return success. 
5. If failure what needs to be done? (other rename's might have succeeded, this might fail due to ENOTEMPTY(even due to race) 


As for subvol down, we can't guarantee in the scenarios of brick going down after stage 1(setxattr). 

Brick going down before start of subvolume: We do not allow rename to progress anywhere. 

If a brick goes down after setxattr, if it has files, or files are created after its up (possible race), then we cant recover. 


With regards, 
Shishir 




_______________________________________________ 
Gluster-devel mailing list 
Gluster-devel at nongnu.org 
https://lists.nongnu.org/mailman/listinfo/gluster-devel 

</blockquote>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130124/5e68026c/attachment-0001.html>


More information about the Gluster-devel mailing list