[Bugs] [Bug 1740968] glustershd can not decide heald_sinks, and skip repair, so some entries lingering in volume heal info

bugzilla at redhat.com bugzilla at redhat.com
Tue Aug 27 06:59:32 UTC 2019


--- Comment #10 from Karthik U S <ksubrahm at redhat.com> ---
(In reply to Hunang  Shujun from comment #9)
> the healed_sinks is empty is because afr_selfheal_find_direction do not find
> any "sink". In the function, only the node who accuse by source node can be
> decided as sink, other accuse node will not be identified as sink.  The rule
> is valid or not?  Any reason?
>   for (i = 0; i < priv->child_count; i++) {
>                 if (!sources[i])---> the accuse info will not be taken into
> consider when the node is not source
>                         continue;
>                 if (self_accused[i])
>                         continue;
>                 for (j = 0; j < priv->child_count; j++) {
>                         if (matrix[i][j])
>                                 sinks[j] = 1;
>                 }
>         }

This is a valid code. Here we consider only those bricks which are not blamed
by any of the non-accused bricks as sinks. Then in
__afr_selfheal_entry_prepare() we will intersect the locked_on and sinks to
populate the healed_sinks. After this __afr_selfheal_entry_finalize_source()
will be called which attempts to mark all the bricks which are not source as

    sources_count = AFR_COUNT(sources, priv->child_count);                      
    if ((AFR_CMP(locked_on, healed_sinks, priv->child_count) == 0) ||           
        !sources_count || afr_does_witness_exist(this, witness)) {   ------->
These condition does not hold true in this case so it fails to mark the
non-sources as sinks            
        memset(sources, 0, sizeof(*sources) * priv->child_count);               
        afr_mark_active_sinks(this, sources, locked_on, healed_sinks);          
        return -1;                                                              

    source = afr_choose_source_by_policy(priv, sources, AFR_ENTRY_TRANSACTION); 
    return source;

We need to handle this case separately where we have source set but there is no
brick marked as sink. Since this is happening for entry heal we can not
directly consider all the other bricks as sinks, which might lead to data loss.
So the best way would be to do conservative merge here. I will check whether
this happens for data & metadata heal case as well (ideally it should not) and
then send a patch to fix this.

You are receiving this mail because:
You are on the CC list for the bug.

More information about the Bugs mailing list