[Gluster-devel] scrubber crash

Gaurav Garg ggarg at redhat.com
Mon Jun 1 11:20:57 UTC 2015



----- Original Message -----
From: "Venky Shankar" <vshankar at redhat.com>
To: ggarg at redhat.com, anekkunt at redhat.com
Cc: gluster-devel at gluster.org
Sent: Monday, June 1, 2015 3:28:21 PM
Subject: Re: [Gluster-devel] scrubber crash



On 06/01/2015 02:23 PM, Venky Shankar wrote:
>
>
> On 06/01/2015 01:09 PM, Anand Nekkunti wrote:
>> Hi Venky
>>    one of regression test in my patch, I found core dump from 
>> scrubber . Please have a look.
>>
>> Link 
>> :http://build.gluster.org/job/rackspace-regression-2GB-triggered/9925/consoleFull
>>
>> bt fir core ...
>>
>> (gdb) bt
>> #0  0x00007f89d6224731 in gf_tw_mod_timer_pending (base=0xf2fbc0, 
>> timer=0x0, expires=233889) at 
>> /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/contrib/timer-wheel/timer-wheel.c:239
>> #1  0x00007f89c82ce7e8 in br_fsscan_reschedule (this=0x7f89c4008980, 
>> child=0x7f89c4011238, fsscan=0x7f89c4012290, fsscrub=0x7f89c4010010, 
>> pendingcheck=_gf_true)
>
> The crash happens when scrubber is paused as reconfigure() blindly 
> accesses scrubber specific data which is not available _after_ pause.
>
> Thanks for reporting. I'll send a fix for this.
OK. This is not a straight forward crash. The crash is due to a race 
between CHILD_UP (marking the subvolume as "up" and initializing 
essential structures _later_) and reconfigure() which tries to access 
structures which are yet to be initialized.

>>>For now we can induce delay before invoking reconfigure() {"pause" in 
>>>the test case} and work on a proper fix for this.

in the test case how much delay we need we don't know. so one idea is to wait for few second in reconfigure function
and poll whether timer have initialized or not. if it is initialized then proceed further. otherwise skip.

>>>Thoughts?

-Venky


More information about the Gluster-devel mailing list