[Bugs] [Bug 1220057] New: glusterd crashes when brick option validation fails
bugzilla at redhat.com
bugzilla at redhat.com
Sat May 9 15:48:18 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1220057
Bug ID: 1220057
Summary: glusterd crashes when brick option validation fails
Product: GlusterFS
Version: 3.7.0
Component: glusterd
Keywords: Triaged
Severity: urgent
Assignee: bugs at gluster.org
Reporter: vbellur at redhat.com
CC: bugs at gluster.org, gluster-bugs at redhat.com,
jdarcy at redhat.com
Depends On: 1211749
Blocks: 1219026
+++ This bug was initially created as a clone of Bug #1211749 +++
While working on server-side AFR/NSR support, I added the following test to
tests/basic/afr/read-subvol-data.t:
TEST ! set_read_subvol $V0 no-such-xlator
The result was a glusterd crash, like this:
(gdb) bt
#0 0x00007f0e5d5920b0 in pthread_spin_lock () from /lib64/libpthread.so.0
#1 0x00007f0e5e46dd69 in __gf_free (...)
at mem-pool.c:303
#2 0x00007f0e59e251b4 in gd_sync_task_begin (...)
at glusterd-syncop.c:1767
#3 0x00007f0e59e25260 in glusterd_op_begin_synctask (...)
at glusterd-syncop.c:1787
#4 0x00007f0e59d770b2 in __glusterd_handle_set_volume (...)
at glusterd-handler.c:1871
The __gf_free in question is for op_errstr, explaining the nature of the
validation error. Here's a relevant comment from the patch I'll be posting
momentarily, at the point where op_errstr is set, describing the real problem
and a fix/workaround.
* In the validation-error code path, the graph is freed
* before op_errstr is. Therefore, if the memory block for
* op_errstr still contains a reference to a translator within
* that graph, we'll crash. Make sure the reference is to a
* translator that's not going away instead.
--- Additional comment from Anand Avati on 2015-04-14 15:35:44 EDT ---
REVIEW: http://review.gluster.org/10238 (core/glusterd: avoid crash when option
validation fails) posted (#1) for review on master by Jeff Darcy
(jdarcy at redhat.com)
--- Additional comment from Anand Avati on 2015-04-14 16:05:21 EDT ---
REVIEW: http://review.gluster.org/10238 (core/glusterd: avoid crash when option
validation fails) posted (#2) for review on master by Jeff Darcy
(jdarcy at redhat.com)
--- Additional comment from Anand Avati on 2015-04-15 12:36:03 EDT ---
REVIEW: http://review.gluster.org/10238 (core/glusterd: avoid crash when option
validation fails) posted (#3) for review on master by Jeff Darcy
(jdarcy at redhat.com)
--- Additional comment from Anand Avati on 2015-04-28 05:05:51 EDT ---
REVIEW: http://review.gluster.org/10417 (core: use reference counting for
mem_acct structures) posted (#1) for review on master by Jeff Darcy
(jdarcy at redhat.com)
--- Additional comment from Anand Avati on 2015-04-28 07:44:15 EDT ---
REVIEW: http://review.gluster.org/10417 (core: use reference counting for
mem_acct structures) posted (#2) for review on master by Jeff Darcy
(jdarcy at redhat.com)
--- Additional comment from Anand Avati on 2015-05-04 10:17:33 EDT ---
REVIEW: http://review.gluster.org/10417 (core: use reference counting for
mem_acct structures) posted (#3) for review on master by Jeff Darcy
(jdarcy at redhat.com)
--- Additional comment from Anand Avati on 2015-05-09 09:28:37 EDT ---
COMMIT: http://review.gluster.org/10417 committed in master by Vijay Bellur
(vbellur at redhat.com)
------
commit c085871e3919df2b309b76633e75d5449899437a
Author: Jeff Darcy <jdarcy at redhat.com>
Date: Tue Apr 28 04:40:00 2015 -0400
core: use reference counting for mem_acct structures
When freeing memory, our memory-accounting code expects to be able to
dereference from the (previously) allocated block to its owning
translator. However, as we have already found once in option
validation and twice in logging, that translator might itself have
been freed and the dereference attempt causes on of our daemons to
crash with SIGSEGV. This patch attempts to fix that as follows:
* We no longer embed a struct mem_acct directly in a struct xlator,
but instead allocate it separately.
* Allocated memory blocks now contain a pointer to the mem_acct
instead of the xlator.
* The mem_acct structure contains a reference count, manipulated in
both the normal and translator allocate/free code using atomic
increments and decrements.
* Because it's now a separate structure, we can defer freeing the
mem_acct until its reference count reaches zero (either way).
* Some unit tests were disabled, because they embedded their own
copies of the implementation for what they were supposedly testing.
Life's too short to spend time fixing tests that seem designed to
impede progress by requiring a certain implementation as well as
behavior.
Change-Id: Id929b11387927136f78626901729296b6c0d0fd7
BUG: 1211749
Signed-off-by: Jeff Darcy <jdarcy at redhat.com>
Reviewed-on: http://review.gluster.org/10417
Tested-by: Gluster Build System <jenkins at build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas at redhat.com>
Reviewed-by: Niels de Vos <ndevos at redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>
--- Additional comment from Anand Avati on 2015-05-09 09:38:00 EDT ---
REVIEW: http://review.gluster.org/10723 (core: use reference counting for
mem_acct structures) posted (#1) for review on release-3.7 by Vijay Bellur
(vbellur at redhat.com)
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1211749
[Bug 1211749] glusterd crashes when brick option validation fails
https://bugzilla.redhat.com/show_bug.cgi?id=1219026
[Bug 1219026] glusterd crashes when brick option validation fails
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list