[Gluster-devel] scrubber crash

Venky Shankar vshankar at redhat.com
Mon Jun 1 09:58:21 UTC 2015



On 06/01/2015 02:23 PM, Venky Shankar wrote:
>
>
> On 06/01/2015 01:09 PM, Anand Nekkunti wrote:
>> Hi Venky
>>    one of regression test in my patch, I found core dump from 
>> scrubber . Please have a look.
>>
>> Link 
>> :http://build.gluster.org/job/rackspace-regression-2GB-triggered/9925/consoleFull
>>
>> bt fir core ...
>>
>> (gdb) bt
>> #0  0x00007f89d6224731 in gf_tw_mod_timer_pending (base=0xf2fbc0, 
>> timer=0x0, expires=233889) at 
>> /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/contrib/timer-wheel/timer-wheel.c:239
>> #1  0x00007f89c82ce7e8 in br_fsscan_reschedule (this=0x7f89c4008980, 
>> child=0x7f89c4011238, fsscan=0x7f89c4012290, fsscrub=0x7f89c4010010, 
>> pendingcheck=_gf_true)
>
> The crash happens when scrubber is paused as reconfigure() blindly 
> accesses scrubber specific data which is not available _after_ pause.
>
> Thanks for reporting. I'll send a fix for this.
OK. This is not a straight forward crash. The crash is due to a race 
between CHILD_UP (marking the subvolume as "up" and initializing 
essential structures _later_) and reconfigure() which tries to access 
structures which are yet to be initialized.

For now we can induce delay before invoking reconfigure() {"pause" in 
the test case} and work on a proper fix for this.

Thoughts?

-Venky


More information about the Gluster-devel mailing list