[Gluster-users] KVM lockups on Gluster 4.1.1

Amar Tumballi atumball at redhat.com
Tue Oct 2 08:59:46 UTC 2018


Recently, in one of the situation, we found that locks were not freed up
due to not getting TCP timeout..

Can you try the option like below and let us know?

`gluster volume set $volname tcp-user-timeout 42`

(ref: https://review.gluster.org/21170/ )

Regards,
Amar


On Tue, Oct 2, 2018 at 10:40 AM Dmitry Melekhov <dm at belkam.com> wrote:

> 01.10.2018 23:09, Danny Lee пишет:
>
> Ran into this issue too with 4.1.5 with an arbiter setup.  Also could not
> run a statedump due to "Segmentation fault".
>
> Tried with 3.12.13 and had issues with locked files as well.  We were able
> to do a statedump and found that some of our files were "BLOCKED"
> (xlator.features.locks.vol-locks.inode).  Attached part of statedump.
>
> Also tried clearing the locks using clear-locks, which did remove the
> lock, but as soon as I tried to cat the file, it got locked again and the
> cat process hung.
>
>
> I created issue in bugzilla, can't find it though :-(
> Looks like there is no activity after I sent all logs...
>
>
>
> On Wed, Aug 29, 2018, 3:13 AM Dmitry Melekhov <dm at belkam.com> wrote:
>
>> 28.08.2018 10:43, Amar Tumballi пишет:
>>
>>
>>
>> On Tue, Aug 28, 2018 at 11:24 AM, Dmitry Melekhov <dm at belkam.com> wrote:
>>
>>> Hello!
>>>
>>>
>>> Yesterday we hit something like this on 4.1.2
>>>
>>> Centos 7.5.
>>>
>>>
>>> Volume is replicated - two bricks and one arbiter.
>>>
>>>
>>> We rebooted arbiter, waited for heal end,  and tried to live migrate VM
>>> to another node ( we run VMs on gluster nodes ):
>>>
>>>
>>> [2018-08-27 09:56:22.085411] I [MSGID: 115029]
>>> [server-handshake.c:763:server_setvolume] 0-pool-server: accepted client
>>> from
>>> CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-
>>> client-6-RECON_NO:-0 (version: 4.1.2)
>>> [2018-08-27 09:56:22.107609] I [MSGID: 115036]
>>> [server.c:483:server_rpc_notify] 0-pool-server: disconnecting connection
>>> from
>>> CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-
>>> client-6-RECON_NO:-0
>>> [2018-08-27 09:56:22.107747] I [MSGID: 101055]
>>> [client_t.c:444:gf_client_unref] 0-pool-server: Shutting down connection
>>> CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-clien
>>> t-6-RECON_NO:-0
>>> [2018-08-27 09:58:37.905829] I [MSGID: 115036]
>>> [server.c:483:server_rpc_notify] 0-pool-server: disconnecting connection
>>> from
>>> CTX_ID:c3eb6cfc-2ef9-470a-89d1-a87170d00da5-GRAPH_ID:0-PID:30292-HOST:father-PC_NAME:p
>>> ool-client-6-RECON_NO:-0
>>> [2018-08-27 09:58:37.905926] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28c831d8bc550000}
>>> [2018-08-27 09:58:37.905959] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=2870a7d6bc550000}
>>> [2018-08-27 09:58:37.905979] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=2880a7d6bc550000}
>>> [2018-08-27 09:58:37.905997] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28f031d8bc550000}
>>> [2018-08-27 09:58:37.906016] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28b07dd5bc550000}
>>> [2018-08-27 09:58:37.906034] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28e0a7d6bc550000}
>>> [2018-08-27 09:58:37.906056] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28b845d8bc550000}
>>> [2018-08-27 09:58:37.906079] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=2858a7d8bc550000}
>>> [2018-08-27 09:58:37.906098] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=2868a8d7bc550000}
>>> [2018-08-27 09:58:37.906121] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28f80bd7bc550000}
>>> ...
>>>
>>> [2018-08-27 09:58:37.907375] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28a8cdd6bc550000}
>>> [2018-08-27 09:58:37.907393] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=2880cdd6bc550000}
>>> [2018-08-27 09:58:37.907476] I [socket.c:3837:socket_submit_reply]
>>> 0-tcp.pool-server: not connected (priv->connected = -1)
>>> [2018-08-27 09:58:37.907520] E [rpcsvc.c:1378:rpcsvc_submit_generic]
>>> 0-rpc-service: failed to submit message (XID: 0xcb88cb, Program: GlusterFS
>>> 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
>>> [2018-08-27 09:58:37.910727] E [server.c:137:server_submit_reply]
>>> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>>> [0x7ffb64379084]
>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>>> ba) [0x7ffb5fddf5ba]
>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>>> [0x7ffb5fd89fce] ) 0-: Reply submission failed
>>> [2018-08-27 09:58:37.910814] E [rpcsvc.c:1378:rpcsvc_submit_generic]
>>> 0-rpc-service: failed to submit message (XID: 0xcb88ce, Program: GlusterFS
>>> 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
>>> [2018-08-27 09:58:37.910861] E [server.c:137:server_submit_reply]
>>> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>>> [0x7ffb64379084]
>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>>> ba) [0x7ffb5fddf5ba]
>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>>> [0x7ffb5fd89fce] ) 0-: Reply submission failed
>>> [2018-08-27 09:58:37.910904] E [rpcsvc.c:1378:rpcsvc_submit_generic]
>>> 0-rpc-service: failed to submit message (XID: 0xcb88cf, Program: GlusterFS
>>> 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
>>> [2018-08-27 09:58:37.910940] E [server.c:137:server_submit_reply]
>>> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>>> [0x7ffb64379084]
>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>>> ba) [0x7ffb5fddf5ba]
>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>>> [0x7ffb5fd89fce] ) 0-: Reply submission failed
>>> [2018-08-27 09:58:37.910979] E [rpcsvc.c:1378:rpcsvc_submit_generic]
>>> 0-rpc-service: failed to submit message (XID: 0xcb88d1, Program: GlusterFS
>>> 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
>>> [2018-08-27 09:58:37.911012] E [server.c:137:server_submit_reply]
>>> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>>> [0x7ffb64379084]
>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>>> ba) [0x7ffb5fddf5ba]
>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>>> [0x7ffb5fd89fce] ) 0-: Reply submission failed
>>> [2018-08-27 09:58:37.911050] E [rpcsvc.c:1378:rpcsvc_submit_generic]
>>> 0-rpc-service: failed to submit message (XID: 0xcb88d8, Program: GlusterFS
>>> 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
>>> [2018-08-27 09:58:37.911083] E [server.c:137:server_submit_reply]
>>> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>>> [0x7ffb64379084]
>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>>> ba) [0x7ffb5fddf5ba]
>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>>> [0x7ffb5fd89fce] ) 0-: Reply submission failed
>>> [2018-08-27 09:58:37.916217] E [server.c:137:server_submit_reply]
>>> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>>> [0x7ffb64379084]
>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>>> ba) [0x7ffb5fddf5ba]
>>> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>>> [0x7ffb5fd89fce] ) 0-: Reply submission failed
>>> [2018-08-27 09:58:37.916520] I [MSGID: 115013]
>>> [server-helpers.c:286:do_fd_cleanup] 0-pool-server: fd cleanup on
>>> /balamak.img
>>>
>>>
>>> after this I/O on  /balamak.img was blocked.
>>>
>>>
>>> Only solution we found was to reboot all 3 nodes.
>>>
>>>
>>> Is there any bug report in bugzilla we can add logs?
>>>
>>>
>> Not aware of such bugs!
>>
>>
>>> Is it possible to turn of these locks?
>>>
>>>
>> Not sure, will get back on this one!
>>
>>
>>
>> btw, found this link
>> https://docs.gluster.org/en/v3/Troubleshooting/troubleshooting-filelocks/
>>
>> tried on another (test) cluster:
>>
>>  [root at marduk ~]# gluster volume statedump pool
>> Segmentation fault (core dumped)
>>
>>
>> 4.1.2 too...
>>
>> something is wrong here.
>>
>>
>>
>>
>>> Thank you!
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>> --
>> Amar Tumballi (amarts)
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users



-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181002/32ee8686/attachment.html>


More information about the Gluster-users mailing list