[Gluster-devel] scrubber crash

Venky Shankar vshankar at redhat.com
Mon Jun 1 12:52:19 UTC 2015



On 06/01/2015 04:50 PM, Gaurav Garg wrote:
>
> ----- Original Message -----
> From: "Venky Shankar" <vshankar at redhat.com>
> To: ggarg at redhat.com, anekkunt at redhat.com
> Cc: gluster-devel at gluster.org
> Sent: Monday, June 1, 2015 3:28:21 PM
> Subject: Re: [Gluster-devel] scrubber crash
>
>
>
> On 06/01/2015 02:23 PM, Venky Shankar wrote:
>>
>> On 06/01/2015 01:09 PM, Anand Nekkunti wrote:
>>> Hi Venky
>>>     one of regression test in my patch, I found core dump from
>>> scrubber . Please have a look.
>>>
>>> Link
>>> :http://build.gluster.org/job/rackspace-regression-2GB-triggered/9925/consoleFull
>>>
>>> bt fir core ...
>>>
>>> (gdb) bt
>>> #0  0x00007f89d6224731 in gf_tw_mod_timer_pending (base=0xf2fbc0,
>>> timer=0x0, expires=233889) at
>>> /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/contrib/timer-wheel/timer-wheel.c:239
>>> #1  0x00007f89c82ce7e8 in br_fsscan_reschedule (this=0x7f89c4008980,
>>> child=0x7f89c4011238, fsscan=0x7f89c4012290, fsscrub=0x7f89c4010010,
>>> pendingcheck=_gf_true)
>> The crash happens when scrubber is paused as reconfigure() blindly
>> accesses scrubber specific data which is not available _after_ pause.
>>
>> Thanks for reporting. I'll send a fix for this.
> OK. This is not a straight forward crash. The crash is due to a race
> between CHILD_UP (marking the subvolume as "up" and initializing
> essential structures _later_) and reconfigure() which tries to access
> structures which are yet to be initialized.
>
>>>> For now we can induce delay before invoking reconfigure() {"pause" in
>>>> the test case} and work on a proper fix for this.
> in the test case how much delay we need we don't know. so one idea is to wait for few second in reconfigure function
> and poll whether timer have initialized or not. if it is initialized then proceed further. otherwise skip.

Since this is an intermittent fix, it's better to induce the delay in 
the test case for now. That should be OK for most of the cases till we 
fix the issue properly.
>
>>>> Thoughts?
> -Venky
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel



More information about the Gluster-devel mailing list