[Gluster-users] Fencing FOPs on data-split-brained files

Ravishankar N ravishankar at redhat.com
Wed Jan 8 10:04:42 UTC 2014


Hi,

Based on an internal discussion we had, I am putting forward some points 
on the proposed changes:


*lookup*: For files in data split-brain (DSB), allow lookup to succeed 
and return the inode attributes (struct iatt) from the file which has 
the bigger size.

         For files in metadata split-brain (MSB), allow lookup to 
succeed and use the below resolution:

*Mismatching attribute* 	*Resolution:*
time(a_time/m_time/c_time)
	return the one which has newest m_time
uid/ gid 	return the uid/gid as root:root so that further FOPS will fail 
due to lack of permission
nlink 	return the bigger of the values
file permission (st_mode) 	return AND of the file permissions.


          For files in entry split-brain (ESB), lookup has to fail.


Note that if lookup gets called before the other FOPS, then the above is 
the expected behaviour. If it doesn't (due to caching, or the split 
brain occurring after lookup happens etc),
then we need to define what happens on each FOP:

*stat*: If file is in split-brain, send stat to all subvolumes,and 
perform the same steps as done in lookup (i.e. perform same checks as 
above).

*write*: Allow writes to go through irrespective of the type of 
split-brain. This is in marked difference with the current behaviour 
where we disallow writes to DSB files.The rationale is that the write 
could include a truncate to zero, which is a valid use case for 
resolving the split-brained file if the user wishes to do so.

*read*: Do not allow reads irrespective of the type of split-brain. This 
would serve as a indication to user that file is in split-brain.

*get(f)attr*: For DSB, allow it.
             For MSB, Don't allow.

*set(f)attr*: For DSB and MSB, allow it.

*touch (create), hardlink, softlink, rename, chown, chmod, unlink*:Allow 
the operation for all type of split-brains

Forcing look ups to occur for readdirp:If a directory is in split brain 
and a *readdirp* is issued, after getting the entries, AFR needs to 
check them for split-brains and for those entries which are in 
split-brain,it needs to set the inode to null before unwinding the reply 
to the parent xlator. What we are essentially doing here is downgrading 
a readdirp to a readdir, thereby ensuring that a lookup is always 
triggered if that file is accessed again.

Thanks,
Ravi




On 12/27/2013 04:40 PM, Ravishankar N wrote:
>
>
>
> -------- Original Message --------
> Subject: 	Re: [Gluster-users] Fencing FOPs on data-split-brained files
> Date: 	Tue, 19 Nov 2013 16:03:14 +0530
> From: 	Ravishankar N <ravishankar at redhat.com>
> To: 	Anand Avati <avati at gluster.org>
> CC: 	Gluster Devel <gluster-devel at nongnu.org>, 
> "gluster-users at gluster.org" <gluster-users at gluster.org>
>
>
>
> On 11/16/2013 01:42 AM, Anand Avati wrote:
>> Ravi,
>> We should not mix up data and entry operation domains, if a file is 
>> in data split brain that should not stop a user from 
>> rename/link/unlink operations on the file.
>>
>> Regarding your concern about complications while healing - we should 
>> change our "manual fixing" instructions to:
>>
>> - go to backend, access through gfid path or normal path
>> - rmxattr the afr changelogs
>> - truncate the file to 0 bytes (like "> filename")
>>
>> Accessing the path through gfid and truncating to 0 bytes addresses 
>> your concerns about hardlinks/renames.
>>
>> Avati
>>
>
>
> /Resending the mail again as there was no response
> -Ravi
> /
>
> All,
>
> I have tabulated what operations must/ mustn't be permitted in case of 
> different split brains. Some of the columns are '?' as I am not sure 
> what the expected behaviour should be. Could we have this validated?
>
>
> *File Operation permitted* 	*Type of Split Brain*
> *Data SB* 	*Metadata SB* 	*Entry SB*
> *
> * 	*
> * 	*Same entry gfid mismatch SB* 	*Different entries*
> write 	No 	Yes (currently no) 	No 	Yes
> read 	No 	Yes (currently no) 	No 	Yes
> getfattr 	Yes 	No 	No 	Yes
> lookup 	? 	? 	No 	Yes
> stat/fstat 	? 	? 	No 	Yes
> setfattr 	Yes 	No 	No 	Yes
> touch 	Yes 	Yes 	No 	Yes
> hard link creation 	Yes 	Yes 	No 	Yes
> soft link creation 	Yes 	Yes 	Yes 	Yes
> rename 	Yes 	Yes 	no 	Yes
> chown 	Yes 	Yes 	Currently No 	Yes
> chmod 	Yes 	Yes 	Currently No 	Yes
> unlink 	Yes 	Yes 	Currently No 	Yes
> readdir 	N/A 	N/A 	? 	?
>
>
> - stat() also reports the file size. If a data split-brained file has 
> different sizes, should stat succeed?
> - Likewise if metadata split brain is due to different access 
> permissions, say one brick has file chmod'ed with 777 and the other 
> brick has it with 744, should we allow read/write if the corresponding 
> permission bits are *not* conflciting ? ( as of today they aren't allowed)
>
> Also,In the table above, Entry Split brain has 2 cases-
> i) where same entry has different gfids
> ii) each brick  has different entries for the same directory (which 
> can cause deleted files to appear in case of conservative merge).
> Should we allow readdir in either case?
>
> Thanks,
> Ravi
>
>> On Wed, Nov 13, 2013 at 3:01 AM, Ravishankar N 
>> <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>
>>     Hi,
>>
>>     Currenly in glusterfs, when there is a data splt-brain (only) on
>>     a file, we disallow the following operations from the mount-point
>>     by returning EIO to the application:
>>     - Writes to the file (truncate, dd, echo, cp etc)
>>     - Reads to the file (cat)
>>     - Reading extended attributes (getfattr) [1]
>>
>>     However we do permit the following operations:
>>     -creating hardlinks
>>     -creating symlinks
>>     -mv
>>     -setattr
>>     -chmod
>>     -chown
>>     --touch
>>     -ls
>>     -stat
>>
>>     While it makes sense to allow `ls` and `stat`, is it okay to  add
>>     checks in the FOPS to disallow the other operations? Allowing
>>     creation of links and changing file attributes only seems to
>>     complicate things before the admin can go to the backend bricks
>>     and resolve the splitbrain (by deleteing all but the healthy copy
>>     of the file including hardlinks). More so if the file is renamed
>>     before addressing the split-brain.
>>     Please share your thoughs.
>>
>>     Thanks,
>>     Ravi
>>
>>     [1] http://review.gluster.org/#/c/5988/
>>     _______________________________________________
>>     Gluster-users mailing list
>>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>     http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140108/348c0fa6/attachment.html>


More information about the Gluster-users mailing list