[Bugs] [Bug 1403121] New: Asynchronous Unsplit-brain still causes Input/ Output Error on system calls

bugzilla at redhat.com bugzilla at redhat.com
Fri Dec 9 06:13:42 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1403121

            Bug ID: 1403121
           Summary: Asynchronous Unsplit-brain still causes Input/Output
                    Error on system calls
           Product: GlusterFS
           Version: 3.9
         Component: replicate
          Keywords: Triaged
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: ravishankar at redhat.com
                CC: bugs at gluster.org, pkarampu at redhat.com,
                    sarumuga at redhat.com,
                    simon.turcotte-langevin at ubisoft.com
        Depends On: 1378547, 1386188, 1387501



+++ This bug was initially created as a clone of Bug #1386188 +++

+++ This bug was initially created as a clone of Bug #1378547 +++

Description of problem:

The unsplit-brain mechanism is triggered along the self-healing mechanism.
Since the self-healing mechanism is asynchronous, so is the unsplit-brain
mechanism. Therefore, even tough the split-brain is resolved eventually, all
system calls made before this happens causes an IOE to occur. This pushes the
responsibility back to the client application, which needs to retry the system
call, which in turn cause a waste of resources.

The self-heal mechanism should still be asynchronous, but the right version of
the favorite child policy should be resolved synchronously to prevent the
Input/Output exception to occur.

Version-Release number of selected component (if applicable):
3.8.4-1

How reproducible:
Create a split-brained file and assert that the first read still always causes
an Input/Output Error.

Steps to Reproduce:
1. Set cluster.entry-self-heal to on, cluster.data-self-heal to on,
cluster.metadata-self-heal to on and cluster.favorite-child-policy to mtime
2. Create a split-brained file
3. Cat the split-brained file -> Ensure that an Input/Output Error is raised
4. Cat the file again ~1sec later -> Ensure that the file was healed

Actual results:
[root at host vol]# cat test
cat: test: Input/output error
[root at host vol]# cat test
[root at host vol]#

Expected results:
[root at host vol]# cat test
[root at host vol]#


Additional info:

--- Additional comment from Worker Ant on 2016-10-18 07:34:57 EDT ---

REVIEW: http://review.gluster.org/15673 (afr: allow I/O when
favorite-child-policy is enabled) posted (#1) for review on master by
Ravishankar N (ravishankar at redhat.com)

--- Additional comment from Worker Ant on 2016-11-02 03:34:58 EDT ---

REVIEW: http://review.gluster.org/15673 (afr: allow I/O when
favorite-child-policy is enabled) posted (#2) for review on master by
Ravishankar N (ravishankar at redhat.com)

--- Additional comment from Worker Ant on 2016-11-02 21:35:09 EDT ---

REVIEW: http://review.gluster.org/15673 (afr: allow I/O when
favorite-child-policy is enabled) posted (#3) for review on master by
Ravishankar N (ravishankar at redhat.com)

--- Additional comment from Worker Ant on 2016-11-07 11:36:48 EST ---

REVIEW: http://review.gluster.org/15673 (afr: allow I/O when
favorite-child-policy is enabled) posted (#4) for review on master by
Ravishankar N (ravishankar at redhat.com)

--- Additional comment from Worker Ant on 2016-11-15 23:30:13 EST ---

REVIEW: http://review.gluster.org/15673 (afr: allow I/O when
favorite-child-policy is enabled) posted (#5) for review on master by
Ravishankar N (ravishankar at redhat.com)

--- Additional comment from Worker Ant on 2016-11-26 07:42:09 EST ---

REVIEW: http://review.gluster.org/15673 (afr: allow I/O when
favorite-child-policy is enabled) posted (#6) for review on master by
Ravishankar N (ravishankar at redhat.com)

--- Additional comment from Worker Ant on 2016-11-26 10:30:18 EST ---

REVIEW: http://review.gluster.org/15673 (afr: allow I/O when
favorite-child-policy is enabled) posted (#7) for review on master by
Ravishankar N (ravishankar at redhat.com)

--- Additional comment from Worker Ant on 2016-11-26 11:02:55 EST ---

REVIEW: http://review.gluster.org/15673 (afr: allow I/O when
favorite-child-policy is enabled) posted (#8) for review on master by
Ravishankar N (ravishankar at redhat.com)

--- Additional comment from Worker Ant on 2016-11-27 22:48:02 EST ---

REVIEW: http://review.gluster.org/15673 (afr: allow I/O when
favorite-child-policy is enabled) posted (#9) for review on master by Pranith
Kumar Karampuri (pkarampu at redhat.com)

--- Additional comment from Worker Ant on 2016-11-28 02:52:02 EST ---

COMMIT: http://review.gluster.org/15673 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com) 
------
commit a07ddd8fcc8dcdcf7ccfa61211d258f13b9f9229
Author: Ravishankar N <ravishankar at redhat.com>
Date:   Sat Nov 26 21:24:01 2016 +0530

    afr: allow I/O when favorite-child-policy is enabled

    Problem:
    Currently, I/O on a split-brained file fails even when the
    favorite-child-policy is set until the self-heal is complete.

    Fix:
    If a valid 'source' is found using the set favorite-child-policy, inspect
    and reset the afr pending xattrs on the 'sinks' (inside appropriate locks),
    refresh the inode and then proceed with the read or write transaction.

    The resetting itself happens in the self-heal code and hence can also
    happen in the client side background-heal or by the shd's index-heal in
    addition to the txn code path explained above. When it happens in via
    heal, we also add checks in undo-pending to not reset the sink xattrs
    again.

    Change-Id: Ic8c1317720cb26bd114b6fe6af4e58c73b864626
    BUG: 1386188
    Signed-off-by: Ravishankar N <ravishankar at redhat.com>
    Reported-by: Simon Turcotte-Langevin <simon.turcotte-langevin at ubisoft.com>
    Reviewed-on: http://review.gluster.org/15673
    Tested-by: Pranith Kumar Karampuri <pkarampu at redhat.com>
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1378547
[Bug 1378547] Asynchronous Unsplit-brain still causes Input/Output Error on
system calls
https://bugzilla.redhat.com/show_bug.cgi?id=1386188
[Bug 1386188] Asynchronous Unsplit-brain still causes Input/Output Error on
system calls
https://bugzilla.redhat.com/show_bug.cgi?id=1387501
[Bug 1387501] Asynchronous Unsplit-brain still causes Input/Output Error on
system calls
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list