[Gluster-devel] dht mkdir preop check, afr and (non-)readable afr subvols

Pranith Kumar Karampuri pkarampu at redhat.com
Wed Jun 1 04:51:11 UTC 2016


Xavi,
        But if we keep winding only to good subvolumes, there is a case
where bad subvolumes will never catch up right? i.e. if we keep creating
files in same directory and everytime self-heal completes there are more
entries mounts would have created on the good subvolumes alone. I think I
must have missed this in the reviews if this is the current behavior. It
was not in the earlier releases. Right?

Pranith

On Tue, May 31, 2016 at 2:17 PM, Raghavendra G <raghavendra at gluster.com>
wrote:

>
>
> On Tue, May 31, 2016 at 12:37 PM, Xavier Hernandez <xhernandez at datalab.es>
> wrote:
>
>> Hi,
>>
>> On 31/05/16 07:05, Raghavendra Gowdappa wrote:
>>
>>> +gluster-devel, +Xavi
>>>
>>> Hi all,
>>>
>>> The context is [1], where bricks do pre-operation checks before doing a
>>> fop and proceed with fop only if pre-op check is successful.
>>>
>>> @Xavi,
>>>
>>> We need your inputs on behavior of EC subvolumes as well.
>>>
>>
>> If I understand correctly, EC shouldn't have any problems here.
>>
>> EC sends the mkdir request to all subvolumes that are currently
>> considered "good" and tries to combine the answers. Answers that match in
>> return code, errno (if necessary) and xdata contents (except for some
>> special xattrs that are ignored for combination purposes), are grouped.
>>
>> Then it takes the group with more members/answers. If that group has a
>> minimum size of #bricks - redundancy, it is considered the good answer.
>> Otherwise EIO is returned because bricks are in an inconsistent state.
>>
>> If there's any answer in another group, it's considered bad and gets
>> marked so that self-heal will repair it using the good information from the
>> majority of bricks.
>>
>> xdata is combined and returned even if return code is -1.
>>
>> Is that enough to cover the needed behavior ?
>>
>
> Thanks Xavi. That's sufficient for the feature in question. One of the
> main cases I was interested in was what would be the behaviour if mkdir
> succeeds on "bad" subvolume and fails on "good" subvolume. Since you never
> wind mkdir to "bad" subvolume(s), this situation never arises.
>
>
>
>>
>> Xavi
>>
>>
>>
>>> [1] http://review.gluster.org/13885
>>>
>>> regards,
>>> Raghavendra
>>>
>>> ----- Original Message -----
>>>
>>>> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>>>> To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
>>>> Cc: "team-quine-afr" <team-quine-afr at redhat.com>, "rhs-zteam" <
>>>> rhs-zteam at redhat.com>
>>>> Sent: Tuesday, May 31, 2016 10:22:49 AM
>>>> Subject: Re: dht mkdir preop check, afr and (non-)readable afr subvols
>>>>
>>>> I think you should start a discussion on gluster-devel so that Xavi
>>>> gets a
>>>> chance to respond on the mails as well.
>>>>
>>>> On Tue, May 31, 2016 at 10:21 AM, Raghavendra Gowdappa <
>>>> rgowdapp at redhat.com>
>>>> wrote:
>>>>
>>>> Also note that we've plans to extend this pre-op check to all dentry
>>>>> operations which also depend parent layout. So, the discussion need to
>>>>> cover all dentry operations like:
>>>>>
>>>>> 1. create
>>>>> 2. mkdir
>>>>> 3. rmdir
>>>>> 4. mknod
>>>>> 5. symlink
>>>>> 6. unlink
>>>>> 7. rename
>>>>>
>>>>> We also plan to have similar checks in lock codepath for directories
>>>>> too
>>>>> (planning to use hashed-subvolume as lock-subvolume for directories).
>>>>> So,
>>>>> more fops :)
>>>>> 8. lk (posix locks)
>>>>> 9. inodelk
>>>>> 10. entrylk
>>>>>
>>>>> regards,
>>>>> Raghavendra
>>>>>
>>>>> ----- Original Message -----
>>>>>
>>>>>> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
>>>>>> To: "team-quine-afr" <team-quine-afr at redhat.com>
>>>>>> Cc: "rhs-zteam" <rhs-zteam at redhat.com>
>>>>>> Sent: Tuesday, May 31, 2016 10:15:04 AM
>>>>>> Subject: dht mkdir preop check, afr and (non-)readable afr subvols
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have some queries related to the behavior of afr_mkdir with respect
>>>>>> to
>>>>>> readable subvols.
>>>>>>
>>>>>> 1. While winding mkdir to subvols does afr check whether the
>>>>>> subvolume is
>>>>>> good/readable? Or does it wind to all subvols irrespective of whether
>>>>>> a
>>>>>> subvol is good/bad? In the latter case, what if
>>>>>>    a. mkdir succeeds on non-readable subvolume
>>>>>>    b. fails on readable subvolume
>>>>>>
>>>>>>   What is the result reported to higher layers in the above scenario?
>>>>>> If
>>>>>>   mkdir is failed, is it cleaned up on non-readable subvolume where it
>>>>>>   failed?
>>>>>>
>>>>>> I am interested in this case as dht-preop check relies on layout
>>>>>> xattrs
>>>>>>
>>>>> and I
>>>>>
>>>>>> assume layout xattrs in particular (and all xattrs in general) are
>>>>>> guaranteed to be correct only on a readable subvolume of afr. So, in
>>>>>>
>>>>> essence
>>>>>
>>>>>> we shouldn't be winding down mkdir on non-readable subvols as whatever
>>>>>>
>>>>> the
>>>>>
>>>>>> decision brick makes as part of pre-op check is inherently flawed.
>>>>>>
>>>>>> regards,
>>>>>> Raghavendra
>>>>>>
>>>>> --
>>>> Pranith
>>>>
>>>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> --
> Raghavendra G
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160601/02af9d05/attachment-0001.html>


More information about the Gluster-devel mailing list