[Bugs] [Bug 1216303] New: Fixes for data self-heal in ec

bugzilla at redhat.com bugzilla at redhat.com
Wed Apr 29 05:11:13 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1216303

            Bug ID: 1216303
           Summary: Fixes for data self-heal in ec
           Product: GlusterFS
           Version: 3.7.0
         Component: disperse
          Assignee: bugs at gluster.org
          Reporter: pkarampu at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com
        Depends On: 1215265



+++ This bug was initially created as a clone of Bug #1215265 +++

Description of problem:
As part of metadata self-heal, ec sets versions of the sinks as to that of
sources so that on going transactions will be successful on the healing
subvolume as well. Then it starts rebuilding the data on to the sinks. If this
data rebuilding fails for some reason the files are left with same version even
when the data rebuilding is not complete, this can lead to data corruption.

Fix:
    If the version numbers do not match, then writes are performed only on at
least N-R bricks which have same version. But if we want to do healing of files
which are constantly modified we need to allow writes on subvols that are
undergoing heal. Data healing will mark 62nd bit while the heal is going on.
When the data transaction sees that this bit is set it needs to perform the fop
on that subvol irrespective of whether the versions match or do not match. Fop
is considered successful only if N-R non-healing bricks succeed.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Anand Avati on 2015-04-24 16:01:45 EDT ---

REVIEW: http://review.gluster.org/10372 (cluster/ec: Perform inode-write on
healing subvols) posted (#1) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Pranith Kumar K on 2015-04-24 16:03:22 EDT ---

Data self-heal needs to be changed so that it marks the SELFHEAL_BIT i.e. 62nd
bit. I will be posting the new code for data self-heal to go with this patch.

--- Additional comment from Anand Avati on 2015-04-25 01:39:26 EDT ---

COMMIT: http://review.gluster.org/10372 committed in master by Vijay Bellur
(vbellur at redhat.com) 
------
commit 7efa7e2116856b4cf37797218612a41bdd237e77
Author: Pranith Kumar K <pkarampu at redhat.com>
Date:   Thu Apr 23 08:30:11 2015 +0530

    cluster/ec: Perform inode-write on healing subvols

    If the version numbers do not match, then writes are performed only on at
least
    N-R bricks which have same version. But if we want to do healing of files
which
    are constantly modified we need to allow writes on subvols that are
undergoing
    heal. Data healing will mark 62nd bit while the heal is going on. When the
data
    transaction sees that this bit is set it needs to perform the fop on that
    subvol irrespective of whether the versions match or do not match. Fop is
    considered successful only if N-R non-healing bricks succeed.

    Change-Id: I69a17582df397aaf6e8ca4b5e746c7ca802cbbde
    BUG: 1215265
    Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
    Reviewed-on: http://review.gluster.org/10372
    Tested-by: NetBSD Build System
    Tested-by: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Vijay Bellur <vbellur at redhat.com>

--- Additional comment from Anand Avati on 2015-04-26 08:41:11 EDT ---

REVIEW: http://review.gluster.org/10382 (syncop: Implement syncop_fxattrop)
posted (#1) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-26 08:41:13 EDT ---

REVIEW: http://review.gluster.org/10383 (storage/posix: prevent NULL
dereference) posted (#1) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-26 12:07:29 EDT ---

REVIEW: http://review.gluster.org/10382 (syncop: Implement syncop_fxattrop)
posted (#2) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-26 12:07:33 EDT ---

REVIEW: http://review.gluster.org/10312 (Adding 64 bits in "version" key of
extended attributes. First 64 bits (Left) represents Data version. Last 64 bits
(right) represents Meta Data version.) posted (#2) for review on master by
Pranith Kumar Karampuri (pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-26 12:07:36 EDT ---

REVIEW: http://review.gluster.org/10384 (data-heal) posted (#1) for review on
master by Pranith Kumar Karampuri (pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-26 12:07:38 EDT ---

REVIEW: http://review.gluster.org/10385 (cluster/ec: Change meaning of
trusted.ec.dirty) posted (#1) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-26 12:07:40 EDT ---

REVIEW: http://review.gluster.org/10386 (cluster/ec: Link new heal
implementation everywhere) posted (#1) for review on master by Pranith Kumar
Karampuri (pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-26 16:43:03 EDT ---

REVIEW: http://review.gluster.org/10382 (syncop: Implement syncop_fxattrop)
posted (#3) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-26 16:43:06 EDT ---

REVIEW: http://review.gluster.org/10386 (cluster/ec: Link new heal
implementation everywhere) posted (#2) for review on master by Pranith Kumar
Karampuri (pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-26 16:43:08 EDT ---

REVIEW: http://review.gluster.org/10384 (data-heal) posted (#2) for review on
master by Pranith Kumar Karampuri (pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-26 16:43:10 EDT ---

REVIEW: http://review.gluster.org/10385 (cluster/ec: Change meaning of
trusted.ec.dirty) posted (#2) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-26 16:43:15 EDT ---

REVIEW: http://review.gluster.org/10312 (Adding 64 bits in "version" key of
extended attributes. First 64 bits (Left) represents Data version. Last 64 bits
(right) represents Meta Data version.) posted (#3) for review on master by
Pranith Kumar Karampuri (pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-26 16:43:17 EDT ---

REVIEW: http://review.gluster.org/10390 (cluster/ec: Handle unhandled states)
posted (#1) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-26 16:43:19 EDT ---

REVIEW: http://review.gluster.org/10391 (libglusterfs: Fix cluster_entrylk
retry) posted (#1) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-27 01:20:10 EDT ---

COMMIT: http://review.gluster.org/10382 committed in master by Vijay Bellur
(vbellur at redhat.com) 
------
commit 585b1f0d9e485674268cb90bd8f3fdb143bab06b
Author: Pranith Kumar K <pkarampu at redhat.com>
Date:   Sun Apr 26 10:40:18 2015 +0530

    syncop: Implement syncop_fxattrop

    Change-Id: Ifc7937ceb451f6e11e40a9513017226fd0f115b0
    BUG: 1215265
    Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
    Reviewed-on: http://review.gluster.org/10382
    Tested-by: NetBSD Build System
    Tested-by: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Krutika Dhananjay <kdhananj at redhat.com>
    Reviewed-by: Vijay Bellur <vbellur at redhat.com>

--- Additional comment from Anand Avati on 2015-04-27 02:36:56 EDT ---

REVIEW: http://review.gluster.org/10386 (cluster/ec: Link new heal
implementation everywhere) posted (#3) for review on master by Pranith Kumar
Karampuri (pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-27 02:36:58 EDT ---

REVIEW: http://review.gluster.org/10384 (data-heal) posted (#3) for review on
master by Pranith Kumar Karampuri (pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-27 02:37:00 EDT ---

REVIEW: http://review.gluster.org/10385 (cluster/ec: Change meaning of
trusted.ec.dirty) posted (#3) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-27 02:37:04 EDT ---

REVIEW: http://review.gluster.org/10390 (cluster/ec: Handle unhandled states)
posted (#2) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-27 02:37:06 EDT ---

REVIEW: http://review.gluster.org/10391 (libglusterfs: Fix cluster_entrylk
retry) posted (#2) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-27 02:37:09 EDT ---

REVIEW: http://review.gluster.org/10312 (Adding 64 bits in "version" key of
extended attributes. First 64 bits (Left) represents Data version. Last 64 bits
(right) represents Meta Data version.) posted (#4) for review on master by
Pranith Kumar Karampuri (pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-04-27 08:04:17 EDT ---

COMMIT: http://review.gluster.org/10383 committed in master by Vijay Bellur
(vbellur at redhat.com) 
------
commit 472d5c67013913ca8646f32ece214a767a955ef9
Author: Pranith Kumar K <pkarampu at redhat.com>
Date:   Sun Apr 26 17:59:49 2015 +0530

    storage/posix: prevent NULL dereference

    filler->fd is never set but used.

    Change-Id: Icf21c439b37c9faa3751658a9e63a74570ed153c
    BUG: 1215265
    Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
    Reviewed-on: http://review.gluster.org/10383
    Tested-by: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Krutika Dhananjay <kdhananj at redhat.com>
    Tested-by: NetBSD Build System
    Reviewed-by: Vijay Bellur <vbellur at redhat.com>

--- Additional comment from Anand Avati on 2015-04-28 07:39:46 EDT ---

COMMIT: http://review.gluster.org/10390 committed in master by Vijay Bellur
(vbellur at redhat.com) 
------
commit 315364b78cd152835cf6d30e32fd145a942e1d7a
Author: Pranith Kumar K <pkarampu at redhat.com>
Date:   Mon Apr 27 00:00:08 2015 +0530

    cluster/ec: Handle unhandled states

    This was leading to hangs when get_size_and_version fails

    Change-Id: Iad9408c2dacc9a74594b8d2f94c95f402533b0f1
    BUG: 1215265
    Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
    Reviewed-on: http://review.gluster.org/10390
    Tested-by: NetBSD Build System
    Tested-by: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Xavier Hernandez <xhernandez at datalab.es>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1215265
[Bug 1215265] Fixes for data self-heal in ec
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list