[Gluster-devel] Query regards to heal xattr heal in dht

Raghavendra G raghavendra at gluster.com
Thu Sep 15 09:31:13 UTC 2016


On Thu, Sep 15, 2016 at 12:02 PM, Nithya Balachandran <nbalacha at redhat.com>
wrote:

>
>
> On 8 September 2016 at 12:02, Mohit Agrawal <moagrawa at redhat.com> wrote:
>
>> Hi All,
>>
>>    I have one another solution to heal user xattr but before implement it
>> i would like to discuss with you.
>>
>>    Can i call function (dht_dir_xattr_heal internally it is calling
>> syncop_setxattr) to heal xattr in dht_getxattr_cbk in last
>>    after make sure we have a valid xattr.
>>    In function(dht_dir_xattr_heal) it will copy blindly all user xattr on
>> all subvolume or i can compare subvol xattr with valid xattr if there is
>> any mismatch then i will call syncop_setxattr otherwise no need to call.
>> syncop_setxattr.
>>
>
>
> This can be problematic if a particular xattr is being removed - it might
> still exist on some subvols. IIUC, the heal would go and reset it again?
>
> One option is to use the hash subvol for the dir as the source - so
> perform xattr op on hashed subvol first and on the others only if it
> succeeds on the hashed. This does have the problem of being unable to set
> xattrs if the hashed subvol is unavailable. This might not be such a big
> deal in case of distributed replicate or distribute disperse volumes but
> will affect pure distribute. However, this way we can at least be
> reasonably certain of the correctness (leaving rebalance out of the
> picture).
>

* What is the behavior of getxattr when hashed subvol is down? Should we
succeed with values from non-hashed subvols or should we fail getxattr?
With hashed-subvol as source of truth, its difficult to determine
correctness of xattrs and their values when it is down.

* setxattr is an inode operation (as opposed to entry operation). So, we
cannot calculate hashed-subvol as in (get)(set)xattr, parent layout and
"basename" is not available. This forces us to store hashed subvol in
inode-ctx. Now, when the hashed-subvol changes we need to update these
inode-ctxs too.

What do you think about a Quorum based solution to this problem?

1. setxattr succeeds only if it is successful on at least (n/2 + 1) number
of subvols.
2. getxattr succeeds only if it is successful and values match on at least
(n/2 + 1) number of subvols.

The flip-side of this solution is we are increasing the probability of
failure of (get)(set)xattr operations as opposed to the hashed-subvol as
source of truth solution. Or are we - how do we compare probability of
hashed-subvol going down with probability of (n/2 + 1) nodes going down
simultaneously? Is it 1/n vs (1/n*1/n*... (n/2+1 times)?. Is 1/n correct
probability for _a specific subvol (hashed-subvol)_ going down (as opposed
to _any one subvol_ going down)?



>
>
>>
>>    Let me know if this approach is suitable.
>>
>>
>>
>> Regards
>> Mohit Agrawal
>>
>> On Wed, Sep 7, 2016 at 10:27 PM, Pranith Kumar Karampuri <
>> pkarampu at redhat.com> wrote:
>>
>>>
>>>
>>> On Wed, Sep 7, 2016 at 9:46 PM, Mohit Agrawal <moagrawa at redhat.com>
>>> wrote:
>>>
>>>> Hi Pranith,
>>>>
>>>>
>>>> In current approach i am getting list of xattr from first up volume and
>>>> update the user attributes from that xattr to
>>>> all other volumes.
>>>>
>>>> I have assumed first up subvol is source and rest of them are sink as
>>>> we are doing same in dht_dir_attr_heal.
>>>>
>>>
>>> I think first up subvol is different for different mounts as per my
>>> understanding, I could be wrong.
>>>
>>>
>>>>
>>>> Regards
>>>> Mohit Agrawal
>>>>
>>>> On Wed, Sep 7, 2016 at 9:34 PM, Pranith Kumar Karampuri <
>>>> pkarampu at redhat.com> wrote:
>>>>
>>>>> hi Mohit,
>>>>>        How does dht find which subvolume has the correct list of
>>>>> xattrs? i.e. how does it determine which subvolume is source and which is
>>>>> sink?
>>>>>
>>>>> On Wed, Sep 7, 2016 at 2:35 PM, Mohit Agrawal <moagrawa at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>   I am trying to find out solution of one problem in dht specific to
>>>>>> user xattr healing.
>>>>>>   I tried to correct it in a same way as we are doing for healing dir
>>>>>> attribute but i feel it is not best solution.
>>>>>>
>>>>>>   To find a right way to heal xattr i want to discuss with you if
>>>>>> anyone does have better solution to correct it.
>>>>>>
>>>>>>   Problem:
>>>>>>    In a distributed volume environment custom extended attribute
>>>>>> value for a directory does not display correct value after stop/start the
>>>>>> brick. If any extended attribute value is set for a directory after stop
>>>>>> the brick the attribute value is not updated on brick after start the brick.
>>>>>>
>>>>>>   Current approach:
>>>>>>     1) function set_user_xattr to store user extended attribute in
>>>>>> dictionary
>>>>>>     2) function dht_dir_xattr_heal call syncop_setxattr to update the
>>>>>> attribute on all volume
>>>>>>     3) Call the function (dht_dir_xattr_heal) for every directory
>>>>>> lookup in dht_lookup_revalidate_cbk
>>>>>>
>>>>>>   Psuedocode for function dht_dir_xatt_heal is like below
>>>>>>
>>>>>>    1) First it will fetch atttributes from first up volume and store
>>>>>> into xattr.
>>>>>>    2) Run loop on all subvolume and fetch existing attributes from
>>>>>> every volume
>>>>>>    3) Replace user attributes from current attributes with xattr user
>>>>>> attributes
>>>>>>    4) Set latest extended attributes(current + old user attributes)
>>>>>> inot subvol.
>>>>>>
>>>>>>
>>>>>>    In this current approach problem is
>>>>>>
>>>>>>    1) it will call heal function(dht_dir_xattr_heal) for every
>>>>>> directory lookup without comparing xattr.
>>>>>>     2) The function internally call syncop xattr for every subvolume
>>>>>> that would be a expensive operation.
>>>>>>
>>>>>>    I have one another way like below to correct it but again in this
>>>>>> one it does have dependency on time (not sure time is synch on all bricks
>>>>>> or not)
>>>>>>
>>>>>>    1) At the time of set extended attribute(setxattr) change time in
>>>>>> metadata at server side
>>>>>>    2) Compare change time before call healing function in
>>>>>> dht_revalidate_cbk
>>>>>>
>>>>>>     Please share your input on this.
>>>>>>     Appreciate your input.
>>>>>>
>>>>>> Regards
>>>>>> Mohit Agrawal
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-devel mailing list
>>>>>> Gluster-devel at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pranith
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Pranith
>>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Raghavendra G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160915/7f4f2455/attachment-0001.html>


More information about the Gluster-devel mailing list