[Bugs] [Bug 1302201] New: Scrubber crash (list corruption)
bugzilla at redhat.com
bugzilla at redhat.com
Wed Jan 27 06:56:44 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1302201
Bug ID: 1302201
Summary: Scrubber crash (list corruption)
Product: GlusterFS
Version: mainline
Component: bitrot
Assignee: bugs at gluster.org
Reporter: vshankar at redhat.com
CC: bugs at gluster.org, khiremat at redhat.com,
manu at netbsd.org, rabhat at redhat.com, vbellur at redhat.com
Depends On: 1302199
Docs Contact: bugs at gluster.org
+++ This bug was initially created as a clone of Bug #1302199 +++
Description of problem:
Emmanuel reported a scrubber crash in NetBSD. Backtrace shows list corruption
when bitrot scrubber tries to fetch an item to scrub from a set of bricks.
Backtrace:
(gdb) bt
#0 0xbb213b74 in list_del_init (old=0x0) at
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/libglusterfs/src/list.h:87
#1 0xbb21682f in _br_scrubber_get_entry (child=0xbb106924, fsentry=0xb84fcfc0)
at
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:1033
#2 0xbb2168b0 in _br_scrubber_find_scrubbable_entry (fsscrub=0xbb106cf0,
fsentry=0xb84fcfc0)
at
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:1055
#3 0xbb216959 in br_scrubber_pick_entry (fsscrub=0xbb106cf0,
fsentry=0xb84fcfc0)
at
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:1077
#4 0xbb216b0f in br_scrubber_proc (arg=<error reading variable: Cannot access
memory at address 0xb84fcfd8>)
at
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:1153
Version-Release number of selected component (if applicable):
3.7
How reproducible:
Intermittently
Steps to Reproduce:
Run the following test case:
./tests/bitrot/br-state-check.t
Actual results:
Test case fails at times and scrubber crashes
Expected results:
Test case should pass (and generate no cores)
Additional info:
--- Additional comment from Venky Shankar on 2016-01-27 01:56:09 EST ---
_br_scrubber_find_scrubbable_entry() does a pthread_cond_wait(...) to get
signalled when ->scrublist is non-empty:
if (list_empty (&fsscrub->scrublist))
pthread_cond_wait (&fsscrub->cond, &fsscrub->mutex);
pthread_cond_wait() is prone to spurious wakeups as mentioned in man(3)
pthread_cond_wait and callers are expected to validate the condition again. In
the above case, if pthread_cond_wait() returns prematurely, then accessing
first element of ->scrublist and calling list_entry() would give garbage.
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1302199
[Bug 1302199] Scrubber crash (list corruption)
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
You are the Docs Contact for the bug.
More information about the Bugs
mailing list