[Bugs] [Bug 1344836] [Disperse volume]: IO hang seen on mount with file ops

bugzilla at redhat.com bugzilla at redhat.com
Sat Jun 11 13:22:41 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1344836



--- Comment #1 from Pranith Kumar K <pkarampu at redhat.com> ---
This is an issue we observed in internal testing:
The locks were getting acquired at the time when bricks were going down because
of ping timeouts. 4 of the 6 bricks went down at that time. 2 of the 6 bricks
have locks which are not being unlocked for some reason and were left stale.

Steps to recreate the issue:
1) create a plain disperse volume
2) Put a breakpoint at ec_wind_inodelk
3) From the fuse mount issue ls -laR <mount>
4) as soon as the break point is hit in gdb, from other terminal kill 4 of the
6 bricks
5) quit gdb
6) Wait for a second or two to confirm that there are stale locks on the
remaining bricks
7) In my case there were, so I issued ls -laR on the mount and it hung.

Relevant logs to come to this conclustion(These failures were on disperse-2 of
6=4+2 setup):
[2016-06-10 17:21:44.690734] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7feed0cd5c32] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7feed0aa084e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7feed0aa095e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7feed0aa22ea] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7feed0aa2b18] )))))
0-ec-nfsganesha-client-15: forced unwinding frame type(GlusterFS 3.3)
op(INODELK(29)) called at 2016-06-10 17:21:44.537422 (xid=0x274d7)

[2016-06-10 17:21:44.771235] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7feed0cd5c32] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7feed0aa084e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7feed0aa095e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7feed0aa22ea] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7feed0aa2b18] )))))
0-ec-nfsganesha-client-17: forced unwinding frame type(GlusterFS 3.3)
op(INODELK(29)) called at 2016-06-10 17:21:44.537520 (xid=0x2740b)

[2016-06-10 17:21:44.773164] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7feed0cd5c32] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7feed0aa084e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7feed0aa095e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7feed0aa22ea] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7feed0aa2b18] )))))
0-ec-nfsganesha-client-16: forced unwinding frame type(GlusterFS 3.3)
op(INODELK(29)) called at 2016-06-10 17:21:44.537487 (xid=0x2740b)

[2016-06-10 17:21:44.808576] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7feed0cd5c32] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7feed0aa084e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7feed0aa095e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7feed0aa22ea] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7feed0aa2b18] )))))
0-ec-nfsganesha-client-14: forced unwinding frame type(GlusterFS 3.3)
op(INODELK(29)) called at 2016-06-10 17:21:44.537377 (xid=0x2740d)

And in statedump of first two bricks we see the following ACTIVE locks:

[root at dhcp35-191 gluster]# vi rhs-brick3-ec-nfsganesha.14996.dump.1465582581
[xlator.features.locks.ec-nfsganesha-locks.inode]
path=/126
mandatory=0
inodelk-count=3
lock-dump.domain.domain=ec-nfsganesha-disperse-2:self-heal
lock-dump.domain.domain=ec-nfsganesha-disperse-2
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 6267,
owner=682574c0ee7f0000, client=0x7f9d0c22b620,
connection-id=dhcp35-98.lab.eng.blr.redhat.com-6267-2016/06/10-17:02:59:402489-ec-nfsganesha-client-12-0-0,
granted at 2016-06-10 17:21:44
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 26625,
owner=68757bd6777f0000, client=0x7f9d77bac630,
connection-id=dhcp35-114.lab.eng.blr.redhat.com-26625-2016/06/10-12:00:18:416748-ec-nfsganesha-client-12-0-0,
blocked at 2016-06-10 17:21:45
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 6267,
owner=f0ba74c0ee7f0000, client=0x7f9d0c22b620,
connection-id=dhcp35-98.lab.eng.blr.redhat.com-6267-2016/06/10-17:02:59:402489-ec-nfsganesha-client-12-0-0,
blocked at 2016-06-10 17:32:25

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list