[Bugs] [Bug 1231617] New: Scrubber crash upon pause

Mon Jun 15 05:50:29 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1231617

            Bug ID: 1231617
           Summary: Scrubber crash upon pause
           Product: GlusterFS
           Version: mainline
         Component: bitrot
          Assignee: bugs at gluster.org
          Reporter: vshankar at redhat.com
                CC: anekkunt at redhat.com, bugs at gluster.org,
                    ggarg at redhat.com, nsathyan at redhat.com,
                    rmekala at redhat.com
        Depends On: 1226666, 1226830
      Docs Contact: bugs at gluster.org

+++ This bug was initially created as a clone of Bug #1226830 +++

Description of problem:
Pausing scrubber results in scrubber process crashing at times.

Version-Release number of selected component (if applicable):
3.7.0

How reproducible:
Sometimes

Steps to Reproduce:
1. Create & start a Gluster volume
2. Enable bitrot on the volume
3. Pause scrubber for this volume as per below:

# gluster volume bitrot <vol> scrub pause

Actual results:
Scrubber process crashes at times

Expected results:
Scrubber process should be running (although it should not scrub the filesystem
for the volume)

BT (reported by anekkunt:
http://www.gluster.org/pipermail/gluster-devel/2015-June/045410.html)

(gdb) bt
#0  0x00007f89d6224731 in gf_tw_mod_timer_pending (base=0xf2fbc0, timer=0x0,
expires=233889) at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/contrib/timer-wheel/timer-wheel.c:239
#1  0x00007f89c82ce7e8 in br_fsscan_reschedule (this=0x7f89c4008980,
child=0x7f89c4011238, fsscan=0x7f89c4012290, fsscrub=0x7f89c4010010,
pendingcheck=_gf_true)
    at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:703
#2  0x00007f89c82cc9d4 in reconfigure (this=0x7f89c4008980,
options=0x7f89d3bc9558) at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/bitd/bit-rot.c:1673
#3  0x00007f89d62044cd in xlator_reconfigure_rec (old_xl=0x7f89c4008980,
new_xl=0x7f89c409b460) at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1084
#4  0x00007f89d6204414 in xlator_reconfigure_rec (old_xl=0x7f89c400a6c0,
new_xl=0x7f89c409c500) at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1070
#5  0x00007f89d62045df in xlator_tree_reconfigure (old_xl=0x7f89c400a6c0,
new_xl=0x7f89c409c500) at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1112
#6  0x00007f89d61ec7bd in glusterfs_graph_reconfigure (oldgraph=0x7f89c4001d30,
newgraph=0x7f89c4098130) at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/graph.c:893
#7  0x00007f89d61ec629 in glusterfs_volfile_reconfigure (oldvollen=932,
newvolfile_fp=0x7f89c4097eb0, ctx=0xefe010,

--- Additional comment from Venky Shankar on 2015-06-01 05:41:22 EDT ---

So, the crash is due to a race between CHILD_UP (where ->timer is initialized
for the subvolume) and reconfigure() which tries to access ->timer to
reschedule the scrub time.

--- Additional comment from Anand Avati on 2015-06-11 10:53:00 EDT ---

REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted
(#3) for review on master by Venky Shankar (vshankar at redhat.com)

--- Additional comment from Anand Avati on 2015-06-14 23:35:07 EDT ---

REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted
(#5) for review on master by Venky Shankar (vshankar at redhat.com)

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1226666
[Bug 1226666] BitRot :- Handle brick re-connection sanely in bitd/scrub
process
https://bugzilla.redhat.com/show_bug.cgi?id=1226830
[Bug 1226830] Scrubber crash upon pause
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
You are the Docs Contact for the bug.