[Gluster-users] Fencing FOPs on data-split-brained files
Ravishankar N
ravishankar at redhat.com
Wed Jan 8 10:04:42 UTC 2014
Hi,
Based on an internal discussion we had, I am putting forward some points
on the proposed changes:
*lookup*: For files in data split-brain (DSB), allow lookup to succeed
and return the inode attributes (struct iatt) from the file which has
the bigger size.
For files in metadata split-brain (MSB), allow lookup to
succeed and use the below resolution:
*Mismatching attribute* *Resolution:*
time(a_time/m_time/c_time)
return the one which has newest m_time
uid/ gid return the uid/gid as root:root so that further FOPS will fail
due to lack of permission
nlink return the bigger of the values
file permission (st_mode) return AND of the file permissions.
For files in entry split-brain (ESB), lookup has to fail.
Note that if lookup gets called before the other FOPS, then the above is
the expected behaviour. If it doesn't (due to caching, or the split
brain occurring after lookup happens etc),
then we need to define what happens on each FOP:
*stat*: If file is in split-brain, send stat to all subvolumes,and
perform the same steps as done in lookup (i.e. perform same checks as
above).
*write*: Allow writes to go through irrespective of the type of
split-brain. This is in marked difference with the current behaviour
where we disallow writes to DSB files.The rationale is that the write
could include a truncate to zero, which is a valid use case for
resolving the split-brained file if the user wishes to do so.
*read*: Do not allow reads irrespective of the type of split-brain. This
would serve as a indication to user that file is in split-brain.
*get(f)attr*: For DSB, allow it.
For MSB, Don't allow.
*set(f)attr*: For DSB and MSB, allow it.
*touch (create), hardlink, softlink, rename, chown, chmod, unlink*:Allow
the operation for all type of split-brains
Forcing look ups to occur for readdirp:If a directory is in split brain
and a *readdirp* is issued, after getting the entries, AFR needs to
check them for split-brains and for those entries which are in
split-brain,it needs to set the inode to null before unwinding the reply
to the parent xlator. What we are essentially doing here is downgrading
a readdirp to a readdir, thereby ensuring that a lookup is always
triggered if that file is accessed again.
Thanks,
Ravi
On 12/27/2013 04:40 PM, Ravishankar N wrote:
>
>
>
> -------- Original Message --------
> Subject: Re: [Gluster-users] Fencing FOPs on data-split-brained files
> Date: Tue, 19 Nov 2013 16:03:14 +0530
> From: Ravishankar N <ravishankar at redhat.com>
> To: Anand Avati <avati at gluster.org>
> CC: Gluster Devel <gluster-devel at nongnu.org>,
> "gluster-users at gluster.org" <gluster-users at gluster.org>
>
>
>
> On 11/16/2013 01:42 AM, Anand Avati wrote:
>> Ravi,
>> We should not mix up data and entry operation domains, if a file is
>> in data split brain that should not stop a user from
>> rename/link/unlink operations on the file.
>>
>> Regarding your concern about complications while healing - we should
>> change our "manual fixing" instructions to:
>>
>> - go to backend, access through gfid path or normal path
>> - rmxattr the afr changelogs
>> - truncate the file to 0 bytes (like "> filename")
>>
>> Accessing the path through gfid and truncating to 0 bytes addresses
>> your concerns about hardlinks/renames.
>>
>> Avati
>>
>
>
> /Resending the mail again as there was no response
> -Ravi
> /
>
> All,
>
> I have tabulated what operations must/ mustn't be permitted in case of
> different split brains. Some of the columns are '?' as I am not sure
> what the expected behaviour should be. Could we have this validated?
>
>
> *File Operation permitted* *Type of Split Brain*
> *Data SB* *Metadata SB* *Entry SB*
> *
> * *
> * *Same entry gfid mismatch SB* *Different entries*
> write No Yes (currently no) No Yes
> read No Yes (currently no) No Yes
> getfattr Yes No No Yes
> lookup ? ? No Yes
> stat/fstat ? ? No Yes
> setfattr Yes No No Yes
> touch Yes Yes No Yes
> hard link creation Yes Yes No Yes
> soft link creation Yes Yes Yes Yes
> rename Yes Yes no Yes
> chown Yes Yes Currently No Yes
> chmod Yes Yes Currently No Yes
> unlink Yes Yes Currently No Yes
> readdir N/A N/A ? ?
>
>
> - stat() also reports the file size. If a data split-brained file has
> different sizes, should stat succeed?
> - Likewise if metadata split brain is due to different access
> permissions, say one brick has file chmod'ed with 777 and the other
> brick has it with 744, should we allow read/write if the corresponding
> permission bits are *not* conflciting ? ( as of today they aren't allowed)
>
> Also,In the table above, Entry Split brain has 2 cases-
> i) where same entry has different gfids
> ii) each brick has different entries for the same directory (which
> can cause deleted files to appear in case of conservative merge).
> Should we allow readdir in either case?
>
> Thanks,
> Ravi
>
>> On Wed, Nov 13, 2013 at 3:01 AM, Ravishankar N
>> <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>
>> Hi,
>>
>> Currenly in glusterfs, when there is a data splt-brain (only) on
>> a file, we disallow the following operations from the mount-point
>> by returning EIO to the application:
>> - Writes to the file (truncate, dd, echo, cp etc)
>> - Reads to the file (cat)
>> - Reading extended attributes (getfattr) [1]
>>
>> However we do permit the following operations:
>> -creating hardlinks
>> -creating symlinks
>> -mv
>> -setattr
>> -chmod
>> -chown
>> --touch
>> -ls
>> -stat
>>
>> While it makes sense to allow `ls` and `stat`, is it okay to add
>> checks in the FOPS to disallow the other operations? Allowing
>> creation of links and changing file attributes only seems to
>> complicate things before the admin can go to the backend bricks
>> and resolve the splitbrain (by deleteing all but the healthy copy
>> of the file including hardlinks). More so if the file is renamed
>> before addressing the split-brain.
>> Please share your thoughs.
>>
>> Thanks,
>> Ravi
>>
>> [1] http://review.gluster.org/#/c/5988/
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140108/348c0fa6/attachment.html>
More information about the Gluster-users
mailing list