[Gluster-Maintainers] NetBSD aborted runs
Pranith Kumar Karampuri
pkarampu at redhat.com
Thu Sep 1 03:52:44 UTC 2016
I am seeing a pause when the .t runs that seem to last close to how much
ever time we put in EXPECT_WITHIN
[2016-09-01 03:24:21.852744] I
[common.c:1134:pl_does_monkey_want_stuck_lock] 0-patchy-locks: stuck lock
[2016-09-01 03:24:21.852775] W [inodelk.c:659:pl_inode_setlk]
0-patchy-locks: MONKEY LOCKING (forcing stuck lock)! at 2016-09-01 03:24:21
[2016-09-01 03:24:21.852792] I [server-rpc-fops.c:317:server_finodelk_cbk]
[2016-09-01 03:24:21.861937] I [server-rpc-fops.c:5682:server3_3_inodelk]
[2016-09-01 03:24:21.862318] I [server-rpc-fops.c:278:server_inodelk_cbk]
[2016-09-01 03:24:21.862627] I [server-rpc-fops.c:5682:server3_3_inodelk]
0-patchy-server: inbound <<---- No I/O after this.
[2016-09-01 03:27:19.6N]:++++++++++ G_LOG:tests/features/lock_revocation.t:
TEST: 52 append_to_file /mnt/glusterfs/1/testfile ++++++++++
[2016-09-01 03:27:19.871044] I [server-rpc-fops.c:5772:server3_3_finodelk]
[2016-09-01 03:27:19.871280] I [clear.c:219:clrlk_clear_inodelk]
[2016-09-01 03:27:19.871307] I [clear.c:273:clrlk_clear_inodelk]
[2016-09-01 03:27:19.871330] I [server-rpc-fops.c:278:server_inodelk_cbk]
[2016-09-01 03:27:19.871389] W [inodelk.c:228:__inodelk_prune_stale]
0-patchy-locks: Lock revocation [reason: age; gfid:
3ccca736-ba89-4f8c-ba17-f6cdbcd0e3c3; domain: patchy-replicate-0; age: 178
sec] - Inode lock revoked: 0 granted & 1 blocked locks cleared
We can prevent the hang with adding $CLI volume stop $V0, but the test
would fail. When that happens, the following error is printed on the
console from perfused
perfused: perfuse_node_inactive: perfuse_node_fsync failed error = 57:
Resource temporarily unavailable <<--- I wonder if this comes because
INODELK fop fails with EAGAIN.
I am also seeing a weird behaviour where it says it is releasing granted
locks but prints that it released 1 blocked lock.
I think there are 2 things going on here. 1) There is a hang, I am still
guessing it is gluster issue until proven otherwise.
2) I got to figure out why the counters are showing wrong information from
the information printed in the logs. I kept going through the code, it
seems fine. It should have printed that it released 1 granted lock & 0
blocked locks. But it prints it in reverse.
If you do git diff on nbslave72.cloud.gluster.org, you can see the changes
I made. Could you please help?
On Sun, Aug 28, 2016 at 7:36 AM, Atin Mukherjee <amukherj at redhat.com> wrote:
> This is still bothering us a lot and looks like there is a genuine issue
> in the code which is making the the process to be hung/deadlocked?
> Raghavendra T - any more findings?
> On Friday 19 August 2016, Atin Mukherjee <amukherj at redhat.com> wrote:
>> NetBSD regressions are getting aborted very frequently. Apart from the
>> infra issue related to connectivity (Nigel has started looking into it),
>> lock_revocation.t is getting hung in such instances which is causing run to
>> be aborted after 300 minutes. This has already started impacting the
>> patches to get in which eventually impacts the upcoming release cycles.
>> I'd request the feature owner/maintainer to have a look at it asap.
> maintainers mailing list
> maintainers at gluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the maintainers