[Bugs] [Bug 1232307] New: Scrubber crash upon pause

Tue Jun 16 13:08:17 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1232307

            Bug ID: 1232307
           Summary: Scrubber crash upon pause
           Product: Red Hat Gluster Storage
           Version: 3.1
         Component: glusterfs
     Sub Component: bitrot
          Assignee: rhs-bugs at redhat.com
          Reporter: vshankar at redhat.com
        QA Contact: rmekala at redhat.com
                CC: anekkunt at redhat.com, bugs at gluster.org,
                    ggarg at redhat.com, nsathyan at redhat.com,
                    rmekala at redhat.com
        Depends On: 1226666, 1226830, 1231617, 1231619
             Group: redhat

+++ This bug was initially created as a clone of Bug #1231617 +++

+++ This bug was initially created as a clone of Bug #1226830 +++

Description of problem:
Pausing scrubber results in scrubber process crashing at times.

Version-Release number of selected component (if applicable):
3.7.0

How reproducible:
Sometimes

Steps to Reproduce:
1. Create & start a Gluster volume
2. Enable bitrot on the volume
3. Pause scrubber for this volume as per below:

# gluster volume bitrot <vol> scrub pause

Actual results:
Scrubber process crashes at times

Expected results:
Scrubber process should be running (although it should not scrub the filesystem
for the volume)

BT (reported by anekkunt:
http://www.gluster.org/pipermail/gluster-devel/2015-June/045410.html)

(gdb) bt
#0  0x00007f89d6224731 in gf_tw_mod_timer_pending (base=0xf2fbc0, timer=0x0,
expires=233889) at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/contrib/timer-wheel/timer-wheel.c:239
#1  0x00007f89c82ce7e8 in br_fsscan_reschedule (this=0x7f89c4008980,
child=0x7f89c4011238, fsscan=0x7f89c4012290, fsscrub=0x7f89c4010010,
pendingcheck=_gf_true)
    at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:703
#2  0x00007f89c82cc9d4 in reconfigure (this=0x7f89c4008980,
options=0x7f89d3bc9558) at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/bitd/bit-rot.c:1673
#3  0x00007f89d62044cd in xlator_reconfigure_rec (old_xl=0x7f89c4008980,
new_xl=0x7f89c409b460) at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1084
#4  0x00007f89d6204414 in xlator_reconfigure_rec (old_xl=0x7f89c400a6c0,
new_xl=0x7f89c409c500) at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1070
#5  0x00007f89d62045df in xlator_tree_reconfigure (old_xl=0x7f89c400a6c0,
new_xl=0x7f89c409c500) at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1112
#6  0x00007f89d61ec7bd in glusterfs_graph_reconfigure (oldgraph=0x7f89c4001d30,
newgraph=0x7f89c4098130) at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/graph.c:893
#7  0x00007f89d61ec629 in glusterfs_volfile_reconfigure (oldvollen=932,
newvolfile_fp=0x7f89c4097eb0, ctx=0xefe010,

--- Additional comment from Venky Shankar on 2015-06-01 05:41:22 EDT ---

So, the crash is due to a race between CHILD_UP (where ->timer is initialized
for the subvolume) and reconfigure() which tries to access ->timer to
reschedule the scrub time.

--- Additional comment from Anand Avati on 2015-06-11 10:53:00 EDT ---

REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted
(#3) for review on master by Venky Shankar (vshankar at redhat.com)

--- Additional comment from Anand Avati on 2015-06-14 23:35:07 EDT ---

REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted
(#5) for review on master by Venky Shankar (vshankar at redhat.com)

--- Additional comment from Anand Avati on 2015-06-15 01:53:31 EDT ---

REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted
(#6) for review on master by Venky Shankar (vshankar at redhat.com)

--- Additional comment from Anand Avati on 2015-06-15 23:53:56 EDT ---

REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted
(#7) for review on master by Venky Shankar (vshankar at redhat.com)

--- Additional comment from Anand Avati on 2015-06-16 02:35:46 EDT ---

REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted
(#8) for review on master by Venky Shankar (vshankar at redhat.com)

--- Additional comment from Anand Avati on 2015-06-16 04:38:27 EDT ---

REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted
(#9) for review on master by Venky Shankar (vshankar at redhat.com)

--- Additional comment from Anand Avati on 2015-06-16 05:42:46 EDT ---

REVIEW: http://review.gluster.org/11147 (features/bitrot: cleanup, v1) posted
(#10) for review on master by Venky Shankar (vshankar at redhat.com)

--- Additional comment from Anand Avati on 2015-06-16 05:42:53 EDT ---

REVIEW: http://review.gluster.org/11248 (tests/bitrot: remove induced delay)
posted (#1) for review on master by Venky Shankar (vshankar at redhat.com)

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1226666
[Bug 1226666] BitRot :- Handle brick re-connection sanely in bitd/scrub
process
https://bugzilla.redhat.com/show_bug.cgi?id=1226830
[Bug 1226830] Scrubber crash upon pause
https://bugzilla.redhat.com/show_bug.cgi?id=1231617
[Bug 1231617] Scrubber crash upon pause
https://bugzilla.redhat.com/show_bug.cgi?id=1231619
[Bug 1231619] BitRot :- Handle brick re-connection sanely in bitd/scrub
process
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=r49ay0g5SZ&a=cc_unsubscribe