[Bugs] [Bug 1302201] New: Scrubber crash (list corruption)

bugzilla at redhat.com bugzilla at redhat.com
Wed Jan 27 06:56:44 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1302201

            Bug ID: 1302201
           Summary: Scrubber crash (list corruption)
           Product: GlusterFS
           Version: mainline
         Component: bitrot
          Assignee: bugs at gluster.org
          Reporter: vshankar at redhat.com
                CC: bugs at gluster.org, khiremat at redhat.com,
                    manu at netbsd.org, rabhat at redhat.com, vbellur at redhat.com
        Depends On: 1302199
      Docs Contact: bugs at gluster.org



+++ This bug was initially created as a clone of Bug #1302199 +++

Description of problem:

Emmanuel reported a scrubber crash in NetBSD. Backtrace shows list corruption
when bitrot scrubber tries to fetch an item to scrub from a set of bricks.

Backtrace:

(gdb) bt
#0  0xbb213b74 in list_del_init (old=0x0) at
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/libglusterfs/src/list.h:87
#1  0xbb21682f in _br_scrubber_get_entry (child=0xbb106924, fsentry=0xb84fcfc0)
    at
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:1033
#2  0xbb2168b0 in _br_scrubber_find_scrubbable_entry (fsscrub=0xbb106cf0,
fsentry=0xb84fcfc0)
    at
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:1055
#3  0xbb216959 in br_scrubber_pick_entry (fsscrub=0xbb106cf0,
fsentry=0xb84fcfc0)
    at
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:1077
#4  0xbb216b0f in br_scrubber_proc (arg=<error reading variable: Cannot access
memory at address 0xb84fcfd8>)
    at
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:1153

Version-Release number of selected component (if applicable):
3.7

How reproducible:
Intermittently

Steps to Reproduce:
Run the following test case:

    ./tests/bitrot/br-state-check.t

Actual results:
Test case fails at times and scrubber crashes

Expected results:
Test case should pass (and generate no cores)

Additional info:

--- Additional comment from Venky Shankar on 2016-01-27 01:56:09 EST ---

_br_scrubber_find_scrubbable_entry() does a pthread_cond_wait(...) to get
signalled when ->scrublist is non-empty:

    if (list_empty (&fsscrub->scrublist))
        pthread_cond_wait (&fsscrub->cond, &fsscrub->mutex);

pthread_cond_wait() is prone to spurious wakeups as mentioned in man(3)
pthread_cond_wait and callers are expected to validate the condition again. In
the above case, if pthread_cond_wait() returns prematurely, then accessing
first element of ->scrublist and calling list_entry() would give garbage.


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1302199
[Bug 1302199] Scrubber crash (list corruption)
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
You are the Docs Contact for the bug.


More information about the Bugs mailing list