[Bugs] [Bug 1214220] New: Crashes in logging code
bugzilla at redhat.com
bugzilla at redhat.com
Wed Apr 22 09:30:53 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1214220
Bug ID: 1214220
Summary: Crashes in logging code
Product: GlusterFS
Version: 3.7.0
Component: core
Keywords: Triaged
Severity: urgent
Assignee: bugs at gluster.org
Reporter: vbellur at redhat.com
CC: amukherj at redhat.com, bugs at gluster.org,
gluster-bugs at redhat.com, jclift at redhat.com,
jdarcy at redhat.com, vbellur at redhat.com
Depends On: 1212660
Blocks: 1199352 (glusterfs-3.7.0)
+++ This bug was initially created as a clone of Bug #1212660 +++
I looked at seven core dumps from five recently failed regression tests.
Here's a summary.
http://build.gluster.org/job/rackspace-regression-2GB-triggered/7052/console
generated by: tests/geo-rep/georep-rsync-hybrid.t
crash details: in python (gsyncd)
http://build.gluster.org/job/rackspace-regression-2GB-triggered/7038/console
generated by: tests/basic/cdc.t
crash details: in glusterfsd
pthread_spin_lock
__gf_free
log_buf_destroy
_gf_msg_internal
_gf_msg "accepted client from %s (version: %s)"
server_setvolume
http://build.gluster.org/job/rackspace-regression-2GB-triggered/7035/console
generated by: tests/basic/mgmt_v3-locks.t
crash details: in glusterfs
log_buf_destroy
gf_log_flush_list
gf_log_flush_extra_msgs
gf_log_set_log_buf_size
gf_log_disable_suppression_before_exit
cleanup_and_exit
glusterfs_process_volfp
http://build.gluster.org/job/rackspace-regression-2GB-triggered/7030/console
generated by: tests/basic/cdc.t
crash details: in glusterfsd
same as previous server_setvolume
http://build.gluster.org/job/rackspace-regression-2GB-triggered/7029/console
generated by: tests/basic/volume-snapshot-clone.t (three core files)
crash details: in glusterfs
all three same as previous glusterfs_process_volfp crash
That's six out of seven going through log_buf_destroy - different tests,
different daemons, different code paths, but all converging there.
Could it be a coincidence that this is the same logging infrastructure
we've recently started using more heavily? That seems unlikely. It's
entirely possible that log_buf_destroy is the victim (of heap
corruption) rather than a culprit, but chances are that the bug is
somewhere in related code.
--- Additional comment from Justin Clift on 2015-04-17 06:54:00 EDT ---
Cool, keep going. Let's nail this sucker! :)
--- Additional comment from Jeff Darcy on 2015-04-21 11:26:43 EDT ---
This turns out to be a relative of both bug 1211749 and bug 1211473 - a memory
object allocated in a translator has persisted past the lifetime of that
translator. The translator pointer in that memory object's header is therefore
no longer valid, and when the memory tracking code tries to dereference through
that pointer . . . BOOM.
In those other cases, the problem had to do with a temporary graph created for
option validation. In this case it has to do with the list we use to detect
and coalesce duplicate log messages. While the log_buf objects themselves are
allocated from a pool, various elements are copied via gf_strdup, using THIS
from the current context as the owning translator. The solution is going to be
rather similar to that for 1211749:
http://review.gluster.org/#/c/10238/
It's hacky, but it gets us past having our daemons blow up effectively at
random.
--- Additional comment from Anand Avati on 2015-04-21 11:50:53 EDT ---
REVIEW: http://review.gluster.org/10319 (core: avoid crashes in gf_msg
dup-detection code) posted (#1) for review on master by Jeff Darcy
(jdarcy at redhat.com)
--- Additional comment from Justin Clift on 2015-04-21 12:00:15 EDT ---
Awesome. :)
--- Additional comment from Anand Avati on 2015-04-22 02:15:43 EDT ---
COMMIT: http://review.gluster.org/10319 committed in master by Vijay Bellur
(vbellur at redhat.com)
------
commit 765849ee00f6661c9059122ff2346b03b224745f
Author: Jeff Darcy <jdarcy at redhat.com>
Date: Tue Apr 21 11:48:15 2015 -0400
core: avoid crashes in gf_msg dup-detection code
Use global_xlator for allocations so that we don't try to free objects
belonging to an already-deleted translator (which will crash).
Change-Id: Ie72a546e7770cf5cb8a8370e22448c8d09e3ab37
BUG: 1212660
Signed-off-by: Jeff Darcy <jdarcy at redhat.com>
Reviewed-on: http://review.gluster.org/10319
Reviewed-by: Krishnan Parthasarathi <kparthas at redhat.com>
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins at build.gluster.com>
Reviewed-by: Krutika Dhananjay <kdhananj at redhat.com>
Reviewed-by: Atin Mukherjee <amukherj at redhat.com>
Reviewed-by: Vijay Bellur <vbellur at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1199352
[Bug 1199352] GlusterFS 3.7.0 tracker
https://bugzilla.redhat.com/show_bug.cgi?id=1212660
[Bug 1212660] Crashes in logging code
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list