[Gluster-Maintainers] NetBSD aborted runs

Thu Sep 1 03:52:44 UTC 2016

I am seeing a pause when the .t runs that seem to last close to how much
ever time we put in EXPECT_WITHIN

[2016-09-01 03:24:21.852744] I
[common.c:1134:pl_does_monkey_want_stuck_lock] 0-patchy-locks: stuck lock
[2016-09-01 03:24:21.852775] W [inodelk.c:659:pl_inode_setlk]
0-patchy-locks: MONKEY LOCKING (forcing stuck lock)! at 2016-09-01 03:24:21
[2016-09-01 03:24:21.852792] I [server-rpc-fops.c:317:server_finodelk_cbk]
0-patchy-server: replied
[2016-09-01 03:24:21.861937] I [server-rpc-fops.c:5682:server3_3_inodelk]
0-patchy-server: inbound
[2016-09-01 03:24:21.862318] I [server-rpc-fops.c:278:server_inodelk_cbk]
0-patchy-server: replied
[2016-09-01 03:24:21.862627] I [server-rpc-fops.c:5682:server3_3_inodelk]
0-patchy-server: inbound <<---- No I/O after this.
[2016-09-01 03:27:19.6N]:++++++++++ G_LOG:tests/features/lock_revocation.t:
TEST: 52 append_to_file /mnt/glusterfs/1/testfile ++++++++++
[2016-09-01 03:27:19.871044] I [server-rpc-fops.c:5772:server3_3_finodelk]
0-patchy-server: inbound
[2016-09-01 03:27:19.871280] I [clear.c:219:clrlk_clear_inodelk]
0-patchy-locks: 2
[2016-09-01 03:27:19.871307] I [clear.c:273:clrlk_clear_inodelk]
0-patchy-locks: released_granted
[2016-09-01 03:27:19.871330] I [server-rpc-fops.c:278:server_inodelk_cbk]
0-patchy-server: replied
[2016-09-01 03:27:19.871389] W [inodelk.c:228:__inodelk_prune_stale]
0-patchy-locks: Lock revocation [reason: age; gfid:
3ccca736-ba89-4f8c-ba17-f6cdbcd0e3c3; domain: patchy-replicate-0; age: 178
sec] - Inode lock revoked:  0 granted & 1 blocked locks cleared

We can prevent the hang with adding $CLI volume stop $V0, but the test
would fail. When that happens, the following error is printed on the
console from perfused

perfused: perfuse_node_inactive: perfuse_node_fsync failed error = 57:
Resource temporarily unavailable <<--- I wonder if this comes because
INODELK fop fails with EAGAIN.

I am also seeing a weird behaviour where  it says it is releasing granted
locks but prints that it released 1 blocked lock.

+Manu
I think there are 2 things going on here. 1) There is a hang, I am still
guessing it is gluster issue until proven otherwise.
2) I got to figure out why the counters are showing wrong information from
the information printed in the logs. I kept going through the code, it
seems fine. It should have printed that it released 1 granted lock & 0
blocked locks. But it prints it in reverse.

If you do git diff on nbslave72.cloud.gluster.org, you can see the changes
I made. Could you please help?

On Sun, Aug 28, 2016 at 7:36 AM, Atin Mukherjee <amukherj at redhat.com> wrote:

> This is still bothering us a lot and looks like there is a genuine issue
> in the code which is making the the process to be hung/deadlocked?
>
> Raghavendra T - any more findings?
>
>
> On Friday 19 August 2016, Atin Mukherjee <amukherj at redhat.com> wrote:
>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1368421
>>
>> NetBSD regressions are getting aborted very frequently. Apart from the
>> infra issue related to connectivity (Nigel has started looking into it),
>> lock_revocation.t is getting hung in such instances which is causing run to
>> be aborted after 300 minutes. This has already started impacting the
>> patches to get in which eventually impacts the upcoming release cycles.
>>
>> I'd request the feature owner/maintainer to have a look at it asap.
>>
>> --Atin
>>
>
>
> --
> --Atin
>
> _______________________________________________
> maintainers mailing list
> maintainers at gluster.org
> http://www.gluster.org/mailman/listinfo/maintainers
>
>

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/maintainers/attachments/20160901/45c187d4/attachment.html>