[Gluster-devel] dht mkdir preop check, afr and (non-)readable afr subvols
Pranith Kumar Karampuri
pkarampu at redhat.com
Tue May 31 07:32:49 UTC 2016
Just checked ec code. Looks okay. All entry fops are also updating metadata
and data part of the xattr.
On Tue, May 31, 2016 at 12:37 PM, Xavier Hernandez <xhernandez at datalab.es>
wrote:
> Hi,
>
> On 31/05/16 07:05, Raghavendra Gowdappa wrote:
>
>> +gluster-devel, +Xavi
>>
>> Hi all,
>>
>> The context is [1], where bricks do pre-operation checks before doing a
>> fop and proceed with fop only if pre-op check is successful.
>>
>> @Xavi,
>>
>> We need your inputs on behavior of EC subvolumes as well.
>>
>
> If I understand correctly, EC shouldn't have any problems here.
>
> EC sends the mkdir request to all subvolumes that are currently considered
> "good" and tries to combine the answers. Answers that match in return code,
> errno (if necessary) and xdata contents (except for some special xattrs
> that are ignored for combination purposes), are grouped.
>
> Then it takes the group with more members/answers. If that group has a
> minimum size of #bricks - redundancy, it is considered the good answer.
> Otherwise EIO is returned because bricks are in an inconsistent state.
>
> If there's any answer in another group, it's considered bad and gets
> marked so that self-heal will repair it using the good information from the
> majority of bricks.
>
> xdata is combined and returned even if return code is -1.
>
> Is that enough to cover the needed behavior ?
>
> Xavi
>
>
>
>> [1] http://review.gluster.org/13885
>>
>> regards,
>> Raghavendra
>>
>> ----- Original Message -----
>>
>>> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>>> To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
>>> Cc: "team-quine-afr" <team-quine-afr at redhat.com>, "rhs-zteam" <
>>> rhs-zteam at redhat.com>
>>> Sent: Tuesday, May 31, 2016 10:22:49 AM
>>> Subject: Re: dht mkdir preop check, afr and (non-)readable afr subvols
>>>
>>> I think you should start a discussion on gluster-devel so that Xavi gets
>>> a
>>> chance to respond on the mails as well.
>>>
>>> On Tue, May 31, 2016 at 10:21 AM, Raghavendra Gowdappa <
>>> rgowdapp at redhat.com>
>>> wrote:
>>>
>>> Also note that we've plans to extend this pre-op check to all dentry
>>>> operations which also depend parent layout. So, the discussion need to
>>>> cover all dentry operations like:
>>>>
>>>> 1. create
>>>> 2. mkdir
>>>> 3. rmdir
>>>> 4. mknod
>>>> 5. symlink
>>>> 6. unlink
>>>> 7. rename
>>>>
>>>> We also plan to have similar checks in lock codepath for directories too
>>>> (planning to use hashed-subvolume as lock-subvolume for directories).
>>>> So,
>>>> more fops :)
>>>> 8. lk (posix locks)
>>>> 9. inodelk
>>>> 10. entrylk
>>>>
>>>> regards,
>>>> Raghavendra
>>>>
>>>> ----- Original Message -----
>>>>
>>>>> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
>>>>> To: "team-quine-afr" <team-quine-afr at redhat.com>
>>>>> Cc: "rhs-zteam" <rhs-zteam at redhat.com>
>>>>> Sent: Tuesday, May 31, 2016 10:15:04 AM
>>>>> Subject: dht mkdir preop check, afr and (non-)readable afr subvols
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I have some queries related to the behavior of afr_mkdir with respect
>>>>> to
>>>>> readable subvols.
>>>>>
>>>>> 1. While winding mkdir to subvols does afr check whether the subvolume
>>>>> is
>>>>> good/readable? Or does it wind to all subvols irrespective of whether a
>>>>> subvol is good/bad? In the latter case, what if
>>>>> a. mkdir succeeds on non-readable subvolume
>>>>> b. fails on readable subvolume
>>>>>
>>>>> What is the result reported to higher layers in the above scenario?
>>>>> If
>>>>> mkdir is failed, is it cleaned up on non-readable subvolume where it
>>>>> failed?
>>>>>
>>>>> I am interested in this case as dht-preop check relies on layout xattrs
>>>>>
>>>> and I
>>>>
>>>>> assume layout xattrs in particular (and all xattrs in general) are
>>>>> guaranteed to be correct only on a readable subvolume of afr. So, in
>>>>>
>>>> essence
>>>>
>>>>> we shouldn't be winding down mkdir on non-readable subvols as whatever
>>>>>
>>>> the
>>>>
>>>>> decision brick makes as part of pre-op check is inherently flawed.
>>>>>
>>>>> regards,
>>>>> Raghavendra
>>>>>
>>>> --
>>> Pranith
>>>
>>>
--
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160531/8dac90af/attachment.html>
More information about the Gluster-devel
mailing list