[Gluster-infra] [Gluster-devel] Issue in locks xlators

Fri Mar 10 14:42:21 UTC 2017

On 03/10/2017 02:45 AM, Xavier Hernandez wrote:
> Hi,
>
> I've posted a patch [1] to fix a memory leak in locks xlator. The fix
> seems quite straightforward, however I've seen a deadlock in the centos
> regression twice [2] [3] on the locks_revocation.t test, causing the
> test to timeout and be aborted.
>
> At first sight I haven't seen other failures of this kind for other
> patches, so it seems that the spurious failure has been introduced by my
> patch.

This has been a cause of a few aborted runs in the past as well, and 
looks like it continues to be so, see [4].

I do not think this is due to your patch, as there are a few instances 
of the same in the past as well.

fstat.gluster.org unfortunately does not report this test as the cause 
of an aborted run (I mean to file a bug on this), as otherwise I am sure 
this would have bubbled up higher in that report.

>
> Anyone with deeper knowledge on locks xlator can help me identify the
> cause ? I'm unable to see how the change can interfere with lock
> revocation.
>
> I've tried to reproduce it locally, but the test passed successfully all
> times.
>
> @Nigel, is it possible to get the logs generated by an aborted job from
> some place ? I have looked into the place where failed jobs store their
> logs, but aren't there. It seems that the slave node is restarted after
> an abort, but logs are not saved.
>
> Thanks,
>
> Xavi
>
> [1] https://review.gluster.org/16838/
> [2] https://build.gluster.org/job/centos6-regression/3563/console
> [3] https://build.gluster.org/job/centos6-regression/3579/console
[4] Older failures of lock-revocation.t: 
http://lists.gluster.org/pipermail/gluster-devel/2017-February/052158.html