[Gluster-devel] [Gluster-users] KVM lockups on Gluster 4.1.1
Dmitry Melekhov
dm at belkam.com
Wed Oct 3 06:01:55 UTC 2018
It doesn't work for some reason:
gluster volume set pool tcp-user-timeout 42
volume set: failed: option : tcp-user-timeout does not exist
Did you mean tcp-user-timeout?
4.1.5.
03.10.2018 08:30, Dmitry Melekhov пишет:
> 02.10.2018 12:59, Amar Tumballi пишет:
>> Recently, in one of the situation, we found that locks were not freed
>> up due to not getting TCP timeout..
>>
>> Can you try the option like below and let us know?
>>
>> `gluster volume set $volname tcp-user-timeout 42`
>>
>> (ref: https://review.gluster.org/21170/ )
>>
>> Regards,
>> Amar
>>
>
> Thank you, we'll try this.
>
>>
>> On Tue, Oct 2, 2018 at 10:40 AM Dmitry Melekhov <dm at belkam.com
>> <mailto:dm at belkam.com>> wrote:
>>
>> 01.10.2018 23:09, Danny Lee пишет:
>>> Ran into this issue too with 4.1.5 with an arbiter setup. Also
>>> could not run a statedump due to "Segmentation fault".
>>>
>>> Tried with 3.12.13 and had issues with locked files as well. We
>>> were able to do a statedump and found that some of our files
>>> were "BLOCKED" (xlator.features.locks.vol-locks.inode). Attached
>>> part of statedump.
>>>
>>> Also tried clearing the locks using clear-locks, which did
>>> remove the lock, but as soon as I tried to cat the file, it got
>>> locked again and the cat process hung.
>>
>> I created issue in bugzilla, can't find it though :-(
>> Looks like there is no activity after I sent all logs...
>>
>>
>>>
>>> On Wed, Aug 29, 2018, 3:13 AM Dmitry Melekhov <dm at belkam.com
>>> <mailto:dm at belkam.com>> wrote:
>>>
>>> 28.08.2018 10:43, Amar Tumballi пишет:
>>>>
>>>>
>>>> On Tue, Aug 28, 2018 at 11:24 AM, Dmitry Melekhov
>>>> <dm at belkam.com <mailto:dm at belkam.com>> wrote:
>>>>
>>>> Hello!
>>>>
>>>>
>>>> Yesterday we hit something like this on 4.1.2
>>>>
>>>> Centos 7.5.
>>>>
>>>>
>>>> Volume is replicated - two bricks and one arbiter.
>>>>
>>>>
>>>> We rebooted arbiter, waited for heal end, and tried to
>>>> live migrate VM to another node ( we run VMs on gluster
>>>> nodes ):
>>>>
>>>>
>>>> [2018-08-27 09:56:22.085411] I [MSGID: 115029]
>>>> [server-handshake.c:763:server_setvolume]
>>>> 0-pool-server: accepted client from
>>>> CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-
>>>> client-6-RECON_NO:-0 (version: 4.1.2)
>>>> [2018-08-27 09:56:22.107609] I [MSGID: 115036]
>>>> [server.c:483:server_rpc_notify] 0-pool-server:
>>>> disconnecting connection from
>>>> CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-
>>>> client-6-RECON_NO:-0
>>>> [2018-08-27 09:56:22.107747] I [MSGID: 101055]
>>>> [client_t.c:444:gf_client_unref] 0-pool-server:
>>>> Shutting down connection
>>>> CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-clien
>>>> t-6-RECON_NO:-0
>>>> [2018-08-27 09:58:37.905829] I [MSGID: 115036]
>>>> [server.c:483:server_rpc_notify] 0-pool-server:
>>>> disconnecting connection from
>>>> CTX_ID:c3eb6cfc-2ef9-470a-89d1-a87170d00da5-GRAPH_ID:0-PID:30292-HOST:father-PC_NAME:p
>>>> ool-client-6-RECON_NO:-0
>>>> [2018-08-27 09:58:37.905926] W
>>>> [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>> releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
>>>> held by {client=0x7ffb58035bc0, pid=30292
>>>> lk-owner=28c831d8bc550000}
>>>> [2018-08-27 09:58:37.905959] W
>>>> [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>> releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
>>>> held by {client=0x7ffb58035bc0, pid=30292
>>>> lk-owner=2870a7d6bc550000}
>>>> [2018-08-27 09:58:37.905979] W
>>>> [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>> releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
>>>> held by {client=0x7ffb58035bc0, pid=30292
>>>> lk-owner=2880a7d6bc550000}
>>>> [2018-08-27 09:58:37.905997] W
>>>> [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>> releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
>>>> held by {client=0x7ffb58035bc0, pid=30292
>>>> lk-owner=28f031d8bc550000}
>>>> [2018-08-27 09:58:37.906016] W
>>>> [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>> releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
>>>> held by {client=0x7ffb58035bc0, pid=30292
>>>> lk-owner=28b07dd5bc550000}
>>>> [2018-08-27 09:58:37.906034] W
>>>> [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>> releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
>>>> held by {client=0x7ffb58035bc0, pid=30292
>>>> lk-owner=28e0a7d6bc550000}
>>>> [2018-08-27 09:58:37.906056] W
>>>> [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>> releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
>>>> held by {client=0x7ffb58035bc0, pid=30292
>>>> lk-owner=28b845d8bc550000}
>>>> [2018-08-27 09:58:37.906079] W
>>>> [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>> releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
>>>> held by {client=0x7ffb58035bc0, pid=30292
>>>> lk-owner=2858a7d8bc550000}
>>>> [2018-08-27 09:58:37.906098] W
>>>> [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>> releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
>>>> held by {client=0x7ffb58035bc0, pid=30292
>>>> lk-owner=2868a8d7bc550000}
>>>> [2018-08-27 09:58:37.906121] W
>>>> [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>> releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
>>>> held by {client=0x7ffb58035bc0, pid=30292
>>>> lk-owner=28f80bd7bc550000}
>>>> ...
>>>>
>>>> [2018-08-27 09:58:37.907375] W
>>>> [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>> releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
>>>> held by {client=0x7ffb58035bc0, pid=30292
>>>> lk-owner=28a8cdd6bc550000}
>>>> [2018-08-27 09:58:37.907393] W
>>>> [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>> releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
>>>> held by {client=0x7ffb58035bc0, pid=30292
>>>> lk-owner=2880cdd6bc550000}
>>>> [2018-08-27 09:58:37.907476] I
>>>> [socket.c:3837:socket_submit_reply] 0-tcp.pool-server:
>>>> not connected (priv->connected = -1)
>>>> [2018-08-27 09:58:37.907520] E
>>>> [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service:
>>>> failed to submit message (XID: 0xcb88cb, Program:
>>>> GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to
>>>> rpc-transport (tcp.pool-server)
>>>> [2018-08-27 09:58:37.910727] E
>>>> [server.c:137:server_submit_reply]
>>>> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>>>> [0x7ffb64379084]
>>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>>>> ba) [0x7ffb5fddf5ba]
>>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>>>> [0x7ffb5fd89fce] ) 0-: Reply submission failed
>>>> [2018-08-27 09:58:37.910814] E
>>>> [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service:
>>>> failed to submit message (XID: 0xcb88ce, Program:
>>>> GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to
>>>> rpc-transport (tcp.pool-server)
>>>> [2018-08-27 09:58:37.910861] E
>>>> [server.c:137:server_submit_reply]
>>>> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>>>> [0x7ffb64379084]
>>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>>>> ba) [0x7ffb5fddf5ba]
>>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>>>> [0x7ffb5fd89fce] ) 0-: Reply submission failed
>>>> [2018-08-27 09:58:37.910904] E
>>>> [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service:
>>>> failed to submit message (XID: 0xcb88cf, Program:
>>>> GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to
>>>> rpc-transport (tcp.pool-server)
>>>> [2018-08-27 09:58:37.910940] E
>>>> [server.c:137:server_submit_reply]
>>>> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>>>> [0x7ffb64379084]
>>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>>>> ba) [0x7ffb5fddf5ba]
>>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>>>> [0x7ffb5fd89fce] ) 0-: Reply submission failed
>>>> [2018-08-27 09:58:37.910979] E
>>>> [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service:
>>>> failed to submit message (XID: 0xcb88d1, Program:
>>>> GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to
>>>> rpc-transport (tcp.pool-server)
>>>> [2018-08-27 09:58:37.911012] E
>>>> [server.c:137:server_submit_reply]
>>>> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>>>> [0x7ffb64379084]
>>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>>>> ba) [0x7ffb5fddf5ba]
>>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>>>> [0x7ffb5fd89fce] ) 0-: Reply submission failed
>>>> [2018-08-27 09:58:37.911050] E
>>>> [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service:
>>>> failed to submit message (XID: 0xcb88d8, Program:
>>>> GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to
>>>> rpc-transport (tcp.pool-server)
>>>> [2018-08-27 09:58:37.911083] E
>>>> [server.c:137:server_submit_reply]
>>>> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>>>> [0x7ffb64379084]
>>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>>>> ba) [0x7ffb5fddf5ba]
>>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>>>> [0x7ffb5fd89fce] ) 0-: Reply submission failed
>>>> [2018-08-27 09:58:37.916217] E
>>>> [server.c:137:server_submit_reply]
>>>> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>>>> [0x7ffb64379084]
>>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>>>> ba) [0x7ffb5fddf5ba]
>>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>>>> [0x7ffb5fd89fce] ) 0-: Reply submission failed
>>>> [2018-08-27 09:58:37.916520] I [MSGID: 115013]
>>>> [server-helpers.c:286:do_fd_cleanup] 0-pool-server: fd
>>>> cleanup on /balamak.img
>>>>
>>>>
>>>> after this I/O on /balamak.img was blocked.
>>>>
>>>>
>>>> Only solution we found was to reboot all 3 nodes.
>>>>
>>>>
>>>> Is there any bug report in bugzilla we can add logs?
>>>>
>>>>
>>>> Not aware of such bugs!
>>>>
>>>> Is it possible to turn of these locks?
>>>>
>>>>
>>>> Not sure, will get back on this one!
>>>
>>>
>>> btw, found this link
>>> https://docs.gluster.org/en/v3/Troubleshooting/troubleshooting-filelocks/
>>>
>>> tried on another (test) cluster:
>>>
>>> [root at marduk ~]# gluster volume statedump pool
>>> Segmentation fault (core dumped)
>>>
>>>
>>> 4.1.2 too...
>>>
>>> something is wrong here.
>>>
>>>
>>>> Thank you!
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> <mailto:Gluster-users at gluster.org>
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Amar Tumballi (amarts)
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> --
>> Amar Tumballi (amarts)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20181003/8ebb20da/attachment-0001.html>
More information about the Gluster-devel
mailing list