[Bugs] [Bug 1347489] New: IO ERROR when multiple graph switches
bugzilla at redhat.com
bugzilla at redhat.com
Fri Jun 17 04:34:48 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1347489
Bug ID: 1347489
Summary: IO ERROR when multiple graph switches
Product: GlusterFS
Version: 3.8.0
Component: libgfapi
Assignee: bugs at gluster.org
Reporter: pgurusid at redhat.com
QA Contact: sdharane at redhat.com
CC: bugs at gluster.org, sdharane at redhat.com
Depends On: 1343038
+++ This bug was initially created as a clone of Bug #1343038 +++
Description of problem:
When the IO is going on a client, 2 or more graph switches one after the other
can lead to IO Error from the client.
Version-Release number of selected component (if applicable):
How reproducible:
Attached is the reproducer.
It is also seen when, qemu(libgfapi based) has a VM on gluster storage and
replace brick(add brick followed by remove brick) was executed.
Steps to Reproduce:
1. gcc -lgfapi tests/bugs/libgfapi/glfs_vol_set_IO_ERR.c -o
tests/bugs/libgfapi/glfs_vol_set_IO_ERR -lgfapi
2. ./tests/bugs/libgfapi/glfs_vol_set_IO_ERR <volname> <log file>
3.
Actual results:
It exists with IO error
Expected results:
It should pass
Additional info:
--- Additional comment from Vijay Bellur on 2016-06-06 07:48:36 EDT ---
REVIEW: http://review.gluster.org/14656 (gfapi: Fix IO error caused when there
is consecutive graph switches) posted (#1) for review on master by Poornima G
(pgurusid at redhat.com)
--- Additional comment from Vijay Bellur on 2016-06-14 02:01:54 EDT ---
REVIEW: http://review.gluster.org/14722 (gfapi: Fix IO error caused when there
is consecutive graph switches) posted (#1) for review on master by Poornima G
(pgurusid at redhat.com)
--- Additional comment from Vijay Bellur on 2016-06-16 02:57:18 EDT ---
REVIEW: http://review.gluster.org/14656 (gfapi: Fix IO error caused when there
is consecutive graph switches) posted (#2) for review on master by Poornima G
(pgurusid at redhat.com)
--- Additional comment from Vijay Bellur on 2016-06-16 07:57:47 EDT ---
COMMIT: http://review.gluster.org/14656 committed in master by Jeff Darcy
(jdarcy at redhat.com)
------
commit b8ac20e888fbacad9d90cd8f1c6ff8579a5cefe9
Author: Poornima G <pgurusid at redhat.com>
Date: Mon Jun 6 06:29:40 2016 -0400
gfapi: Fix IO error caused when there is consecutive graph switches
Issue:
Consider a simple situation, where glfs_init() is done, i.e. initial
graph is up. Now perform 2 volume sets that results in 2 client side
graph changes. After this perform some IO, the IO fails with ENOTCON.
The only way to recover this client is i guess another graph switch
or restart.
What actually is happening from code perspective:
Initial graph lets say A, followed by 2 consecutive graph switches
to B and C without any IO those two switches.
- graph_setup (A) as a result of GF_EVENT_CHILD_UP, and
fs->next_subvol = A
- glfs_init() results in fs->active_subvol = A, fs->next_subvol = NULL
- graph_setup (B) as a result of GF_EVENT_CHILD_UP, and
fs->next_subvol = B
- graph_setup (C) as a result of GF_EVENT_CHILD_UP, and
fs->next_subvol = C. It also sees that the previous graph B was never
set as fs->active_subvol, i.e. no IO or anything happened on B, so
can safely send GF_EVENT_PARENT_DOWN (by calling glfs_subvol_done(B)).
This parent down on B, results in child_down(B), which is fine.
But child_down also triggers graph_setup(B).
- graph_setup(B) as a result of GF_EVENT_CHILD_DOWN, and
fs->next_subvol = B, and GF_EVENT_PARENT_DOWN on C as explained
above. This again leads to GF_EVENT_CHILD_DOWN on C.
- graph_setup(C) as a result of GF_EVENT_CHILD_DOWN, and
fs->next_subvol = C, and GF_EVENT_PARENT_DOWN on B as explained
above.
Thus both the graphs B and C are disconnected, and hence the ENOTCON
Solution:
Remove the call to graph_setup() when the event is GF_EVENT_CHILD_DOWN.
It don't see any reason why graph_setup should be called when there is
child_down. Not sure what the original reason was, to have graph_setup
in child_down. git hostory shows the first patch itself had this call.
Change-Id: I9de86555f66cc94a05649ac863b40ed3426ffd4b
BUG: 1343038
Signed-off-by: Poornima G <pgurusid at redhat.com>
Reviewed-on: http://review.gluster.org/14656
Smoke: Gluster Build System <jenkins at build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
Reviewed-by: Jeff Darcy <jdarcy at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1343038
[Bug 1343038] IO ERROR when multiple graph switches
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list