[Gluster-users] KVM lockups on Gluster 4.1.1
Danny Lee
dannyl at vt.edu
Mon Oct 1 19:09:59 UTC 2018
Ran into this issue too with 4.1.5 with an arbiter setup. Also could not
run a statedump due to "Segmentation fault".
Tried with 3.12.13 and had issues with locked files as well. We were able
to do a statedump and found that some of our files were "BLOCKED"
(xlator.features.locks.vol-locks.inode). Attached part of statedump.
Also tried clearing the locks using clear-locks, which did remove the lock,
but as soon as I tried to cat the file, it got locked again and the cat
process hung.
On Wed, Aug 29, 2018, 3:13 AM Dmitry Melekhov <dm at belkam.com> wrote:
> 28.08.2018 10:43, Amar Tumballi пишет:
>
>
>
> On Tue, Aug 28, 2018 at 11:24 AM, Dmitry Melekhov <dm at belkam.com> wrote:
>
>> Hello!
>>
>>
>> Yesterday we hit something like this on 4.1.2
>>
>> Centos 7.5.
>>
>>
>> Volume is replicated - two bricks and one arbiter.
>>
>>
>> We rebooted arbiter, waited for heal end, and tried to live migrate VM
>> to another node ( we run VMs on gluster nodes ):
>>
>>
>> [2018-08-27 09:56:22.085411] I [MSGID: 115029]
>> [server-handshake.c:763:server_setvolume] 0-pool-server: accepted client
>> from
>> CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-
>> client-6-RECON_NO:-0 (version: 4.1.2)
>> [2018-08-27 09:56:22.107609] I [MSGID: 115036]
>> [server.c:483:server_rpc_notify] 0-pool-server: disconnecting connection
>> from
>> CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-
>> client-6-RECON_NO:-0
>> [2018-08-27 09:56:22.107747] I [MSGID: 101055]
>> [client_t.c:444:gf_client_unref] 0-pool-server: Shutting down connection
>> CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-clien
>> t-6-RECON_NO:-0
>> [2018-08-27 09:58:37.905829] I [MSGID: 115036]
>> [server.c:483:server_rpc_notify] 0-pool-server: disconnecting connection
>> from
>> CTX_ID:c3eb6cfc-2ef9-470a-89d1-a87170d00da5-GRAPH_ID:0-PID:30292-HOST:father-PC_NAME:p
>> ool-client-6-RECON_NO:-0
>> [2018-08-27 09:58:37.905926] W [inodelk.c:610:pl_inodelk_log_cleanup]
>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28c831d8bc550000}
>> [2018-08-27 09:58:37.905959] W [inodelk.c:610:pl_inodelk_log_cleanup]
>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=2870a7d6bc550000}
>> [2018-08-27 09:58:37.905979] W [inodelk.c:610:pl_inodelk_log_cleanup]
>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=2880a7d6bc550000}
>> [2018-08-27 09:58:37.905997] W [inodelk.c:610:pl_inodelk_log_cleanup]
>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28f031d8bc550000}
>> [2018-08-27 09:58:37.906016] W [inodelk.c:610:pl_inodelk_log_cleanup]
>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28b07dd5bc550000}
>> [2018-08-27 09:58:37.906034] W [inodelk.c:610:pl_inodelk_log_cleanup]
>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28e0a7d6bc550000}
>> [2018-08-27 09:58:37.906056] W [inodelk.c:610:pl_inodelk_log_cleanup]
>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28b845d8bc550000}
>> [2018-08-27 09:58:37.906079] W [inodelk.c:610:pl_inodelk_log_cleanup]
>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=2858a7d8bc550000}
>> [2018-08-27 09:58:37.906098] W [inodelk.c:610:pl_inodelk_log_cleanup]
>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=2868a8d7bc550000}
>> [2018-08-27 09:58:37.906121] W [inodelk.c:610:pl_inodelk_log_cleanup]
>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28f80bd7bc550000}
>> ...
>>
>> [2018-08-27 09:58:37.907375] W [inodelk.c:610:pl_inodelk_log_cleanup]
>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28a8cdd6bc550000}
>> [2018-08-27 09:58:37.907393] W [inodelk.c:610:pl_inodelk_log_cleanup]
>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=2880cdd6bc550000}
>> [2018-08-27 09:58:37.907476] I [socket.c:3837:socket_submit_reply]
>> 0-tcp.pool-server: not connected (priv->connected = -1)
>> [2018-08-27 09:58:37.907520] E [rpcsvc.c:1378:rpcsvc_submit_generic]
>> 0-rpc-service: failed to submit message (XID: 0xcb88cb, Program: GlusterFS
>> 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
>> [2018-08-27 09:58:37.910727] E [server.c:137:server_submit_reply]
>> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>> [0x7ffb64379084]
>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>> ba) [0x7ffb5fddf5ba]
>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>> [0x7ffb5fd89fce] ) 0-: Reply submission failed
>> [2018-08-27 09:58:37.910814] E [rpcsvc.c:1378:rpcsvc_submit_generic]
>> 0-rpc-service: failed to submit message (XID: 0xcb88ce, Program: GlusterFS
>> 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
>> [2018-08-27 09:58:37.910861] E [server.c:137:server_submit_reply]
>> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>> [0x7ffb64379084]
>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>> ba) [0x7ffb5fddf5ba]
>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>> [0x7ffb5fd89fce] ) 0-: Reply submission failed
>> [2018-08-27 09:58:37.910904] E [rpcsvc.c:1378:rpcsvc_submit_generic]
>> 0-rpc-service: failed to submit message (XID: 0xcb88cf, Program: GlusterFS
>> 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
>> [2018-08-27 09:58:37.910940] E [server.c:137:server_submit_reply]
>> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>> [0x7ffb64379084]
>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>> ba) [0x7ffb5fddf5ba]
>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>> [0x7ffb5fd89fce] ) 0-: Reply submission failed
>> [2018-08-27 09:58:37.910979] E [rpcsvc.c:1378:rpcsvc_submit_generic]
>> 0-rpc-service: failed to submit message (XID: 0xcb88d1, Program: GlusterFS
>> 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
>> [2018-08-27 09:58:37.911012] E [server.c:137:server_submit_reply]
>> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>> [0x7ffb64379084]
>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>> ba) [0x7ffb5fddf5ba]
>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>> [0x7ffb5fd89fce] ) 0-: Reply submission failed
>> [2018-08-27 09:58:37.911050] E [rpcsvc.c:1378:rpcsvc_submit_generic]
>> 0-rpc-service: failed to submit message (XID: 0xcb88d8, Program: GlusterFS
>> 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
>> [2018-08-27 09:58:37.911083] E [server.c:137:server_submit_reply]
>> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>> [0x7ffb64379084]
>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>> ba) [0x7ffb5fddf5ba]
>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>> [0x7ffb5fd89fce] ) 0-: Reply submission failed
>> [2018-08-27 09:58:37.916217] E [server.c:137:server_submit_reply]
>> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>> [0x7ffb64379084]
>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>> ba) [0x7ffb5fddf5ba]
>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>> [0x7ffb5fd89fce] ) 0-: Reply submission failed
>> [2018-08-27 09:58:37.916520] I [MSGID: 115013]
>> [server-helpers.c:286:do_fd_cleanup] 0-pool-server: fd cleanup on
>> /balamak.img
>>
>>
>> after this I/O on /balamak.img was blocked.
>>
>>
>> Only solution we found was to reboot all 3 nodes.
>>
>>
>> Is there any bug report in bugzilla we can add logs?
>>
>>
> Not aware of such bugs!
>
>
>> Is it possible to turn of these locks?
>>
>>
> Not sure, will get back on this one!
>
>
>
> btw, found this link
> https://docs.gluster.org/en/v3/Troubleshooting/troubleshooting-filelocks/
>
> tried on another (test) cluster:
>
> [root at marduk ~]# gluster volume statedump pool
> Segmentation fault (core dumped)
>
>
> 4.1.2 too...
>
> something is wrong here.
>
>
>
>
>> Thank you!
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> --
> Amar Tumballi (amarts)
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181001/55c8e660/attachment.html>
-------------- next part --------------
[xlator.features.locks.testvol-locks.inode]
path=/_admin/shared/testvol.keystore.lock
mandatory=0
inodelk-count=2
lock-dump.domain.domain=testvol-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=700a00f0d07f0000, client=0x7fb1000b1190, connection-id=eng-test-hot-cold-test-standby.saas.testvolcloud.com-3852-2018/09/18-18:43:32:168282-testvol-client-0-0-0, granted at 2018-09-18 18:44:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 2114, owner=d0c000004b7f0000, client=0x7fb1000b5320, connection-id=eng-test-hot-cold-test.saas.testvolcloud.com-4423-2018/09/18-18:43:33:251627-testvol-client-0-0-0, blocked at 2018-09-18 18:53:04
lock-dump.domain.domain=testvol-replicate-0:self-heal
lock-dump.domain.domain=testvol-replicate-0:metadata
More information about the Gluster-users
mailing list