[Gluster-devel] Spurious failures because of nfs and snapshots

Thu May 22 07:14:53 UTC 2014

I have posted a patch that fixes this issue:
http://review.gluster.org/#/c/7842/

Thanks,
Vijay


On Thursday 22 May 2014 11:35 AM, Vijay Bellur wrote:
> On 05/21/2014 08:50 PM, Vijaikumar M wrote:
>> KP, Atin and myself did some debugging and found that there was a
>> deadlock in glusterd.
>>
>> When creating a volume snapshot, the back-end operation 'taking a
>> lvm_snapshot and starting brick' for the each brick
>> are executed in parallel using synctask framework.
>>
>> brick_start was releasing a big_lock with brick_connect and does a lock
>> again.
>> This caused a deadlock in some race condition where main-thread waiting
>> for one of the synctask thread to finish and
>> synctask-thread waiting for the big_lock.
>>
>>
>> We are working on fixing this issue.
>>
>
> If this fix is going to take more time, can we please log a bug to 
> track this problem and remove the test cases that need to be addressed 
> from the test unit? This way other valid patches will not be blocked 
> by the failure of the snapshot test unit.
>
> We can introduce these tests again as part of the fix for the problem.
>
> -Vijay
>