[Gluster-users] KVM lockups on Gluster 4.1.1

Ravishankar N ravishankar at redhat.com
Wed Oct 10 09:58:41 UTC 2018


Hi,
Sorry for the delay, should have gotten to this earlier. We uncovered 
the issue in our internal QE testing and it is a regression. 
Details/patch is available in BZ 1637802.  I'll back-port it to the 
release branches once it gets merged.  Shout out to Pranith for helping 
with the RCA!
Regards,
Ravi

On 10/02/2018 10:39 AM, Dmitry Melekhov wrote:
> 01.10.2018 23:09, Danny Lee пишет:
>> Ran into this issue too with 4.1.5 with an arbiter setup.  Also could 
>> not run a statedump due to "Segmentation fault".
>>
>> Tried with 3.12.13 and had issues with locked files as well.  We were 
>> able to do a statedump and found that some of our files were 
>> "BLOCKED" (xlator.features.locks.vol-locks.inode).  Attached part of 
>> statedump.
>>
>> Also tried clearing the locks using clear-locks, which did remove the 
>> lock, but as soon as I tried to cat the file, it got locked again and 
>> the cat process hung.
>
> I created issue in bugzilla, can't find it though :-(
> Looks like there is no activity after I sent all logs...
>
>
>>
>> On Wed, Aug 29, 2018, 3:13 AM Dmitry Melekhov <dm at belkam.com 
>> <mailto:dm at belkam.com>> wrote:
>>
>>     28.08.2018 10:43, Amar Tumballi пишет:
>>>
>>>
>>>     On Tue, Aug 28, 2018 at 11:24 AM, Dmitry Melekhov <dm at belkam.com
>>>     <mailto:dm at belkam.com>> wrote:
>>>
>>>         Hello!
>>>
>>>
>>>         Yesterday we hit something like this on 4.1.2
>>>
>>>         Centos 7.5.
>>>
>>>
>>>         Volume is replicated - two bricks and one arbiter.
>>>
>>>
>>>         We rebooted arbiter, waited for heal end,  and tried to live
>>>         migrate VM to another node ( we run VMs on gluster nodes ):
>>>
>>>
>>>         [2018-08-27 09:56:22.085411] I [MSGID: 115029]
>>>         [server-handshake.c:763:server_setvolume] 0-pool-server:
>>>         accepted client from
>>>         CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-
>>>         client-6-RECON_NO:-0 (version: 4.1.2)
>>>         [2018-08-27 09:56:22.107609] I [MSGID: 115036]
>>>         [server.c:483:server_rpc_notify] 0-pool-server:
>>>         disconnecting connection from
>>>         CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-
>>>         client-6-RECON_NO:-0
>>>         [2018-08-27 09:56:22.107747] I [MSGID: 101055]
>>>         [client_t.c:444:gf_client_unref] 0-pool-server: Shutting
>>>         down connection
>>>         CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-clien
>>>         t-6-RECON_NO:-0
>>>         [2018-08-27 09:58:37.905829] I [MSGID: 115036]
>>>         [server.c:483:server_rpc_notify] 0-pool-server:
>>>         disconnecting connection from
>>>         CTX_ID:c3eb6cfc-2ef9-470a-89d1-a87170d00da5-GRAPH_ID:0-PID:30292-HOST:father-PC_NAME:p
>>>         ool-client-6-RECON_NO:-0
>>>         [2018-08-27 09:58:37.905926] W
>>>         [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>         releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>>         by {client=0x7ffb58035bc0, pid=30292 lk-owner=28c831d8bc550000}
>>>         [2018-08-27 09:58:37.905959] W
>>>         [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>         releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>>         by {client=0x7ffb58035bc0, pid=30292 lk-owner=2870a7d6bc550000}
>>>         [2018-08-27 09:58:37.905979] W
>>>         [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>         releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>>         by {client=0x7ffb58035bc0, pid=30292 lk-owner=2880a7d6bc550000}
>>>         [2018-08-27 09:58:37.905997] W
>>>         [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>         releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>>         by {client=0x7ffb58035bc0, pid=30292 lk-owner=28f031d8bc550000}
>>>         [2018-08-27 09:58:37.906016] W
>>>         [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>         releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>>         by {client=0x7ffb58035bc0, pid=30292 lk-owner=28b07dd5bc550000}
>>>         [2018-08-27 09:58:37.906034] W
>>>         [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>         releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>>         by {client=0x7ffb58035bc0, pid=30292 lk-owner=28e0a7d6bc550000}
>>>         [2018-08-27 09:58:37.906056] W
>>>         [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>         releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>>         by {client=0x7ffb58035bc0, pid=30292 lk-owner=28b845d8bc550000}
>>>         [2018-08-27 09:58:37.906079] W
>>>         [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>         releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>>         by {client=0x7ffb58035bc0, pid=30292 lk-owner=2858a7d8bc550000}
>>>         [2018-08-27 09:58:37.906098] W
>>>         [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>         releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>>         by {client=0x7ffb58035bc0, pid=30292 lk-owner=2868a8d7bc550000}
>>>         [2018-08-27 09:58:37.906121] W
>>>         [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>         releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>>         by {client=0x7ffb58035bc0, pid=30292 lk-owner=28f80bd7bc550000}
>>>         ...
>>>
>>>         [2018-08-27 09:58:37.907375] W
>>>         [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>         releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>>         by {client=0x7ffb58035bc0, pid=30292 lk-owner=28a8cdd6bc550000}
>>>         [2018-08-27 09:58:37.907393] W
>>>         [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
>>>         releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>>         by {client=0x7ffb58035bc0, pid=30292 lk-owner=2880cdd6bc550000}
>>>         [2018-08-27 09:58:37.907476] I
>>>         [socket.c:3837:socket_submit_reply] 0-tcp.pool-server: not
>>>         connected (priv->connected = -1)
>>>         [2018-08-27 09:58:37.907520] E
>>>         [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed
>>>         to submit message (XID: 0xcb88cb, Program: GlusterFS 4.x v1,
>>>         ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
>>>         [2018-08-27 09:58:37.910727] E
>>>         [server.c:137:server_submit_reply]
>>>         (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>>>         [0x7ffb64379084]
>>>         -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>>>         ba) [0x7ffb5fddf5ba]
>>>         -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>>>         [0x7ffb5fd89fce] ) 0-: Reply submission failed
>>>         [2018-08-27 09:58:37.910814] E
>>>         [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed
>>>         to submit message (XID: 0xcb88ce, Program: GlusterFS 4.x v1,
>>>         ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
>>>         [2018-08-27 09:58:37.910861] E
>>>         [server.c:137:server_submit_reply]
>>>         (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>>>         [0x7ffb64379084]
>>>         -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>>>         ba) [0x7ffb5fddf5ba]
>>>         -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>>>         [0x7ffb5fd89fce] ) 0-: Reply submission failed
>>>         [2018-08-27 09:58:37.910904] E
>>>         [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed
>>>         to submit message (XID: 0xcb88cf, Program: GlusterFS 4.x v1,
>>>         ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
>>>         [2018-08-27 09:58:37.910940] E
>>>         [server.c:137:server_submit_reply]
>>>         (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>>>         [0x7ffb64379084]
>>>         -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>>>         ba) [0x7ffb5fddf5ba]
>>>         -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>>>         [0x7ffb5fd89fce] ) 0-: Reply submission failed
>>>         [2018-08-27 09:58:37.910979] E
>>>         [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed
>>>         to submit message (XID: 0xcb88d1, Program: GlusterFS 4.x v1,
>>>         ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
>>>         [2018-08-27 09:58:37.911012] E
>>>         [server.c:137:server_submit_reply]
>>>         (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>>>         [0x7ffb64379084]
>>>         -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>>>         ba) [0x7ffb5fddf5ba]
>>>         -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>>>         [0x7ffb5fd89fce] ) 0-: Reply submission failed
>>>         [2018-08-27 09:58:37.911050] E
>>>         [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed
>>>         to submit message (XID: 0xcb88d8, Program: GlusterFS 4.x v1,
>>>         ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
>>>         [2018-08-27 09:58:37.911083] E
>>>         [server.c:137:server_submit_reply]
>>>         (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>>>         [0x7ffb64379084]
>>>         -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>>>         ba) [0x7ffb5fddf5ba]
>>>         -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>>>         [0x7ffb5fd89fce] ) 0-: Reply submission failed
>>>         [2018-08-27 09:58:37.916217] E
>>>         [server.c:137:server_submit_reply]
>>>         (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084)
>>>         [0x7ffb64379084]
>>>         -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
>>>         ba) [0x7ffb5fddf5ba]
>>>         -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce)
>>>         [0x7ffb5fd89fce] ) 0-: Reply submission failed
>>>         [2018-08-27 09:58:37.916520] I [MSGID: 115013]
>>>         [server-helpers.c:286:do_fd_cleanup] 0-pool-server: fd
>>>         cleanup on /balamak.img
>>>
>>>
>>>         after this I/O on  /balamak.img was blocked.
>>>
>>>
>>>         Only solution we found was to reboot all 3 nodes.
>>>
>>>
>>>         Is there any bug report in bugzilla we can add logs?
>>>
>>>
>>>     Not aware of such bugs!
>>>
>>>         Is it possible to turn of these locks?
>>>
>>>
>>>     Not sure, will get back on this one!
>>
>>
>>     btw, found this link
>>     https://docs.gluster.org/en/v3/Troubleshooting/troubleshooting-filelocks/
>>
>>     tried on another (test) cluster:
>>
>>      [root at marduk ~]# gluster volume statedump pool
>>     Segmentation fault (core dumped)
>>
>>
>>     4.1.2 too...
>>
>>     something is wrong here.
>>
>>
>>>         Thank you!
>>>
>>>
>>>
>>>
>>>         _______________________________________________
>>>         Gluster-users mailing list
>>>         Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>         https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>
>>>     -- 
>>>     Amar Tumballi (amarts)
>>
>>
>>     _______________________________________________
>>     Gluster-users mailing list
>>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>     https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181010/f8f1208b/attachment-0001.html>


More information about the Gluster-users mailing list