[Bugs] [Bug 1403121] New: Asynchronous Unsplit-brain still causes Input/ Output Error on system calls
bugzilla at redhat.com
bugzilla at redhat.com
Fri Dec 9 06:13:42 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1403121
Bug ID: 1403121
Summary: Asynchronous Unsplit-brain still causes Input/Output
Error on system calls
Product: GlusterFS
Version: 3.9
Component: replicate
Keywords: Triaged
Severity: high
Assignee: bugs at gluster.org
Reporter: ravishankar at redhat.com
CC: bugs at gluster.org, pkarampu at redhat.com,
sarumuga at redhat.com,
simon.turcotte-langevin at ubisoft.com
Depends On: 1378547, 1386188, 1387501
+++ This bug was initially created as a clone of Bug #1386188 +++
+++ This bug was initially created as a clone of Bug #1378547 +++
Description of problem:
The unsplit-brain mechanism is triggered along the self-healing mechanism.
Since the self-healing mechanism is asynchronous, so is the unsplit-brain
mechanism. Therefore, even tough the split-brain is resolved eventually, all
system calls made before this happens causes an IOE to occur. This pushes the
responsibility back to the client application, which needs to retry the system
call, which in turn cause a waste of resources.
The self-heal mechanism should still be asynchronous, but the right version of
the favorite child policy should be resolved synchronously to prevent the
Input/Output exception to occur.
Version-Release number of selected component (if applicable):
3.8.4-1
How reproducible:
Create a split-brained file and assert that the first read still always causes
an Input/Output Error.
Steps to Reproduce:
1. Set cluster.entry-self-heal to on, cluster.data-self-heal to on,
cluster.metadata-self-heal to on and cluster.favorite-child-policy to mtime
2. Create a split-brained file
3. Cat the split-brained file -> Ensure that an Input/Output Error is raised
4. Cat the file again ~1sec later -> Ensure that the file was healed
Actual results:
[root at host vol]# cat test
cat: test: Input/output error
[root at host vol]# cat test
[root at host vol]#
Expected results:
[root at host vol]# cat test
[root at host vol]#
Additional info:
--- Additional comment from Worker Ant on 2016-10-18 07:34:57 EDT ---
REVIEW: http://review.gluster.org/15673 (afr: allow I/O when
favorite-child-policy is enabled) posted (#1) for review on master by
Ravishankar N (ravishankar at redhat.com)
--- Additional comment from Worker Ant on 2016-11-02 03:34:58 EDT ---
REVIEW: http://review.gluster.org/15673 (afr: allow I/O when
favorite-child-policy is enabled) posted (#2) for review on master by
Ravishankar N (ravishankar at redhat.com)
--- Additional comment from Worker Ant on 2016-11-02 21:35:09 EDT ---
REVIEW: http://review.gluster.org/15673 (afr: allow I/O when
favorite-child-policy is enabled) posted (#3) for review on master by
Ravishankar N (ravishankar at redhat.com)
--- Additional comment from Worker Ant on 2016-11-07 11:36:48 EST ---
REVIEW: http://review.gluster.org/15673 (afr: allow I/O when
favorite-child-policy is enabled) posted (#4) for review on master by
Ravishankar N (ravishankar at redhat.com)
--- Additional comment from Worker Ant on 2016-11-15 23:30:13 EST ---
REVIEW: http://review.gluster.org/15673 (afr: allow I/O when
favorite-child-policy is enabled) posted (#5) for review on master by
Ravishankar N (ravishankar at redhat.com)
--- Additional comment from Worker Ant on 2016-11-26 07:42:09 EST ---
REVIEW: http://review.gluster.org/15673 (afr: allow I/O when
favorite-child-policy is enabled) posted (#6) for review on master by
Ravishankar N (ravishankar at redhat.com)
--- Additional comment from Worker Ant on 2016-11-26 10:30:18 EST ---
REVIEW: http://review.gluster.org/15673 (afr: allow I/O when
favorite-child-policy is enabled) posted (#7) for review on master by
Ravishankar N (ravishankar at redhat.com)
--- Additional comment from Worker Ant on 2016-11-26 11:02:55 EST ---
REVIEW: http://review.gluster.org/15673 (afr: allow I/O when
favorite-child-policy is enabled) posted (#8) for review on master by
Ravishankar N (ravishankar at redhat.com)
--- Additional comment from Worker Ant on 2016-11-27 22:48:02 EST ---
REVIEW: http://review.gluster.org/15673 (afr: allow I/O when
favorite-child-policy is enabled) posted (#9) for review on master by Pranith
Kumar Karampuri (pkarampu at redhat.com)
--- Additional comment from Worker Ant on 2016-11-28 02:52:02 EST ---
COMMIT: http://review.gluster.org/15673 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com)
------
commit a07ddd8fcc8dcdcf7ccfa61211d258f13b9f9229
Author: Ravishankar N <ravishankar at redhat.com>
Date: Sat Nov 26 21:24:01 2016 +0530
afr: allow I/O when favorite-child-policy is enabled
Problem:
Currently, I/O on a split-brained file fails even when the
favorite-child-policy is set until the self-heal is complete.
Fix:
If a valid 'source' is found using the set favorite-child-policy, inspect
and reset the afr pending xattrs on the 'sinks' (inside appropriate locks),
refresh the inode and then proceed with the read or write transaction.
The resetting itself happens in the self-heal code and hence can also
happen in the client side background-heal or by the shd's index-heal in
addition to the txn code path explained above. When it happens in via
heal, we also add checks in undo-pending to not reset the sink xattrs
again.
Change-Id: Ic8c1317720cb26bd114b6fe6af4e58c73b864626
BUG: 1386188
Signed-off-by: Ravishankar N <ravishankar at redhat.com>
Reported-by: Simon Turcotte-Langevin <simon.turcotte-langevin at ubisoft.com>
Reviewed-on: http://review.gluster.org/15673
Tested-by: Pranith Kumar Karampuri <pkarampu at redhat.com>
Smoke: Gluster Build System <jenkins at build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1378547
[Bug 1378547] Asynchronous Unsplit-brain still causes Input/Output Error on
system calls
https://bugzilla.redhat.com/show_bug.cgi?id=1386188
[Bug 1386188] Asynchronous Unsplit-brain still causes Input/Output Error on
system calls
https://bugzilla.redhat.com/show_bug.cgi?id=1387501
[Bug 1387501] Asynchronous Unsplit-brain still causes Input/Output Error on
system calls
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list