[Bugs] [Bug 1408171] New: VM pauses due to storage I/O error, when one of the data brick is down with arbiter/replica volume
bugzilla at redhat.com
bugzilla at redhat.com
Thu Dec 22 11:08:11 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1408171
Bug ID: 1408171
Summary: VM pauses due to storage I/O error, when one of the
data brick is down with arbiter/replica volume
Product: GlusterFS
Version: 3.9
Component: replicate
Assignee: bugs at gluster.org
Reporter: ravishankar at redhat.com
CC: bugs at gluster.org
Depends On: 1404982, 1406224
+++ This bug was initially created as a clone of Bug #1406224 +++
+++ This bug was initially created as a clone of Bug #1404982 +++
Description of problem:
In a arbiter volume when one of the data brick is killed and start writing I/O
i see that vm goes to paused state and following is seen in the mount logs.
[2016-12-15 09:47:16.357700] E [MSGID: 108008]
[afr-transaction.c:2557:afr_write_txn_refresh_done] 0-data-replicate-0: Failing
FXATTROP on gfid 883a5c0a-e16e-4937-83b5-5d90d
f1ec956: split-brain observed.
[2016-12-15 09:47:16.357724] E [MSGID: 133016]
[shard.c:631:shard_update_file_size_cbk] 0-data-shard: Update to file size
xattr failed on 883a5c0a-e16e-4937-83b5-5d90df1ec95
6 [Input/output error]
[2016-12-15 09:47:16.357998] W [fuse-bridge.c:2312:fuse_writev_cbk]
0-glusterfs-fuse: 15170: WRITE => -1 gfid=883a5c0a-e16e-4937-83b5-5d90df1ec956
fd=0x7fd0f000f0f8 (Input/o
utput error)
Version-Release number of selected component (if applicable):
glusterfs-3.8.4-8.el7rhgs.x86_64
How reproducible:
Always
Steps to Reproduce:
1. Install HC with three nodes.
2. Inside the vm mount the disk and start writing I/O
3. while I/O is going on kill one of the brick.
Actual results:
I see that vm goes to paused state with Input/output error.
Expected results:
vm should not go to paused state as only one of the data brick is down.
Additional info:
Following is seen in the mount logs:
===========================================
[2016-12-15 09:47:16.357700] E [MSGID: 108008]
[afr-transaction.c:2557:afr_write_txn_refresh_done] 0-data-replicate-0: Failing
FXATTROP on gfid 883a5c0a-e16e-4937-83b5-5d90d
f1ec956: split-brain observed.
[2016-12-15 09:47:16.357724] E [MSGID: 133016]
[shard.c:631:shard_update_file_size_cbk] 0-data-shard: Update to file size
xattr failed on 883a5c0a-e16e-4937-83b5-5d90df1ec95
6 [Input/output error]
[2016-12-15 09:47:16.357998] W [fuse-bridge.c:2312:fuse_writev_cbk]
0-glusterfs-fuse: 15170: WRITE => -1 gfid=883a5c0a-e16e-4937-83b5-5d90df1ec956
fd=0x7fd0f000f0f8 (Input/o
utput error)
Additional comment from Ravishankar N on 2016-12-19 11:25:20 EST ---
Thanks Kasturi for providing the setup for testing and thanks Satheesaran for
providing virsh based commands for re-creating the issue.
The isse is due to a race between inode_refresh_done() and
__afr_set_in_flight_sb_status() that occurs when I/O is going on and a brick is
brought down or up. When the brick goes up/ comes down, inode refresh is
triggered in the write transaction and sets the correct data/metadata readable
and event_generation in inode_refresh_done(). But before it can proceed to the
write FOP, __afr_set_in_flight_sb_status() from another writev cbk resets the
event_generation. When the first write (that follows the inode refresh) gets
the event_gen in afr_inode_get_readable(), it gets zero because of which it
fails the write with EIO.
While ignoring event_generation seems to fix the issue
-----------------------------------------------------------
diff --git a/xlators/cluster/afr/src/afr-common.c
b/xlators/cluster/afr/src/afr-common.c
index 60bae18..2f32e44 100644
--- a/xlators/cluster/afr/src/afr-common.c
+++ b/xlators/cluster/afr/src/afr-common.c
@@ -1089,7 +1089,7 @@ afr_txn_refresh_done (call_frame_t *frame, xlator_t
*this, int err)
&event_generation,
local->transaction.type);
- if (ret == -EIO || !event_generation) {
+ if (ret == -EIO){
/* No readable subvolume even after refresh ==> splitbrain.*/
if (!priv->fav_child_policy) {
err = -EIO;
-----------------------------------------------------------
I need to convince myself that ignoring event gen in afr_txn_refresh_done() is
for reads (there is no prob in ignoring it for writes)does not have any
repercussions.
--- Additional comment from Ravishankar N on 2016-12-19 20:34:53 EST ---
Before http://review.gluster.org/#/c/15673/, after inode refresh, we failed
read txns in case of EIO or event_generation being zero. For write
transactions, the check was only for EIO. 15673 re-factored the code to fail
both read and write when event_generation=0. This seems to have caused a
regression as explained in the description above.
While we could restore the above behaviour, it seems we don't need to check
event_gen value for read transactions as well because it could very well happen
that the event_gen could be set to zero after we checked (post inode refresh)
for it to be non zero but just before we did a stack wind for that read txn.
Send a patch to see if this breaks any upstream regression test.
--- Additional comment from Worker Ant on 2016-12-19 20:36:14 EST ---
REVIEW: http://review.gluster.org/16205 (afr: Ignore event_generation checks
post inode refresh) posted (#1) for review on master by Ravishankar N
(ravishankar at redhat.com)
--- Additional comment from Worker Ant on 2016-12-21 09:01:46 EST ---
REVIEW: http://review.gluster.org/16205 (afr: Ignore event_generation checks
post inode refresh) posted (#2) for review on master by Ravishankar N
(ravishankar at redhat.com)
--- Additional comment from Worker Ant on 2016-12-22 01:03:12 EST ---
REVIEW: http://review.gluster.org/16205 (afr: Ignore event_generation checks
post inode refresh) posted (#3) for review on master by Ravishankar N
(ravishankar at redhat.com)
--- Additional comment from Worker Ant on 2016-12-22 01:13:43 EST ---
REVIEW: http://review.gluster.org/16205 (afr: Ignore event_generation checks
post inode refresh for write txns) posted (#4) for review on master by
Ravishankar N (ravishankar at redhat.com)
--- Additional comment from Worker Ant on 2016-12-22 06:06:35 EST ---
COMMIT: http://review.gluster.org/16205 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com)
------
commit 7ee998b9041d594d93a4e2ef369892c185e80def
Author: Ravishankar N <ravishankar at redhat.com>
Date: Tue Dec 20 07:05:02 2016 +0530
afr: Ignore event_generation checks post inode refresh for write txns
Before http://review.gluster.org/#/c/15673/, after inode refresh, we
failed read txns in case of EIO or event_generation being zero. For
write transactions, the check was only for EIO. 15673 re-factored the
code to fail both read and write when event_generation=0. This seems to
have caused a regression as explained in the BZ.
This patch restores that behaviour in afr_txn_refresh_done().
Change-Id: Ib8e116506badce6f58b55827dbe403d95069d744
BUG: 1406224
Signed-off-by: Ravishankar N <ravishankar at redhat.com>
Reviewed-on: http://review.gluster.org/16205
Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>
Smoke: Gluster Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1404982
[Bug 1404982] VM pauses due to storage I/O error, when one of the data
brick is down with arbiter volume/replica volume
https://bugzilla.redhat.com/show_bug.cgi?id=1406224
[Bug 1406224] VM pauses due to storage I/O error, when one of the data
brick is down with arbiter/replica volume
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list