<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:DengXian;
        panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
        {font-family:"\@\7B49\7EBF";
        panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        text-align:justify;
        text-justify:inter-ideograph;
        font-size:10.5pt;
        font-family:DengXian;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:#0563C1;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:#954F72;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:DengXian;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:DengXian;}
/* Page Definitions */
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 90.0pt 72.0pt 90.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="ZH-CN" link="#0563C1" vlink="#954F72" style="text-justify-trim:punctuation">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US">Hi, <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">We are using glusterfs version 3.12 for 3 brick I find that occasionally after reboot all 3 sn nodes simultaneously, the glusterd process on one sn nodes may get stuck, when you try to execute glusterd command it does
not response.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Following is the glusterd stuck log and the gdb info of the stuck glusterd process,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:24.999329] I [rpc-clnt.c:1048:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:24.999450] I [rpc-clnt.c:1048:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:24.999597] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: a4315121-e127-42ca-9869-0fe451216a80<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:24.999692] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: 7f694d7e-7613-4298-8da1-50cbb73ed47e, host: mn-0.local, port:
0<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.010624] W [socket.c:593:__socket_rwv] 0-management: readv on 192.168.1.14:24007 failed (No data available)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.010774] I [MSGID: 106004] [glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer <mn-0.local> (<7f694d7e-7613-4298-8da1-50cbb73ed47e>), in state <Peer in Cluster>,
has disconnected from glusterd.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.010860] W [glusterd-locks.c:846:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x1bba4) [0x7f1828aa0ba4] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x2ed5d)
[0x7f1828ab3d5d] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0xf7635) [0x7f1828b7c635] ) 0-management: Lock for vol ccs not held<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.010903] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for ccs<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.010931] W [glusterd-locks.c:846:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x1bba4) [0x7f1828aa0ba4] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x2ed5d)
[0x7f1828ab3d5d] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0xf7635) [0x7f1828b7c635] ) 0-management: Lock for vol encryptfile not held<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.010945] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for encryptfile<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.010971] W [glusterd-locks.c:846:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x1bba4) [0x7f1828aa0ba4] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x2ed5d)
[0x7f1828ab3d5d] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0xf7635) [0x7f1828b7c635] ) 0-management: Lock for vol export not held<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.010983] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for export<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.011006] W [glusterd-locks.c:846:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x1bba4) [0x7f1828aa0ba4] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x2ed5d)
[0x7f1828ab3d5d] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0xf7635) [0x7f1828b7c635] ) 0-management: Lock for vol log not held<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.011046] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for log<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.011070] W [glusterd-locks.c:846:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x1bba4) [0x7f1828aa0ba4] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x2ed5d)
[0x7f1828ab3d5d] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0xf7635) [0x7f1828b7c635] ) 0-management: Lock for vol mstate not held<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.011082] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for mstate<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.011104] W [glusterd-locks.c:846:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x1bba4) [0x7f1828aa0ba4] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x2ed5d)
[0x7f1828ab3d5d] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0xf7635) [0x7f1828b7c635] ) 0-management: Lock for vol services not held<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.011115] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for services<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.011268] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e9)[0x7f182df24a99] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1f9)[0x7f182dce6f27]
(--> /lib64/libgfrpc.so.0<span style="color:red">(saved_frames_destroy</span>+0x1f)[0x7f182dce701a] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x11e)[0x7f182dce756e] (--> /lib64/libgfrpc.so.0(+0x12f9e)[0x7f182dce7f9e] ))))) 0-management: forced
unwinding frame type(Peer mgmt) op(--(4)) called at 2019-01-28 14:38:24.999851 (xid=0x7)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.011286] E [MSGID: 106158] [glusterd-rpc-ops.c:684:__glusterd_friend_update_cbk] 0-management: RPC Error<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.011301] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received RJT from uuid: 00000000-0000-0000-0000-000000000000<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.011424] I [MSGID: 106006] [glusterd-svc-mgmt.c:328:glusterd_svc_common_rpc_notify] 0-management: glustershd has connected with glusterd.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.011599] I [MSGID: 106143] [glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick /mnt/bricks/ccs/brick on port 53952<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.011862] I [MSGID: 106492] [glusterd-handler.c:2718:<span style="color:red">__glusterd_handle_friend_update</span>] 0-glusterd: Received friend update from uuid: 7f694d7e-7613-4298-8da1-50cbb73ed47e<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.021058] I [MSGID: 106502] [glusterd-handler.c:2763:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.021090] I [socket.c:3704:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.021099] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0xa, Program: GlusterD svc peer, ProgVers: 2, Proc: 4) to rpc-transport (socket.management)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.021108] E [MSGID: 106430] [glusterd-utils.c:568:glusterd_submit_reply] 0-glusterd: Reply submission failed<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.021126] E [rpcsvc.c:559:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.021135] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0xa, Program: GlusterD svc peer, ProgVers: 2, Proc: 4) to rpc-transport (socket.management)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[2019-01-28 14:38:25.021147] W [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: failed to queue error reply<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">Gdb info the stuck thread of glusterd is:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">(gdb) thread 8<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[Switching to thread 8 (Thread 0x7f1826ec2700 (LWP 2418))]<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#0 0x00007f182cce787c in __lll_lock_wait () from /lib64/libpthread.so.0<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">(gdb) bt<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#0 0x00007f182cce787c in __lll_lock_wait () from /lib64/libpthread.so.0<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#1 0x00007f182ccea677 in __lll_lock_elision () from /lib64/libpthread.so.0<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#2 0x00007f182df5cae6 in iobref_unref () from /lib64/libglusterfs.so.0<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#3 0x00007f182dce2f29 in rpc_transport_pollin_destroy () from /lib64/libgfrpc.so.0<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#4 0x00007f1827ccf319 in socket_event_poll_in () from /usr/lib64/glusterfs/3.12.3/rpc-transport/socket.so<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#5 0x00007f1827ccf932 in socket_event_handler () from /usr/lib64/glusterfs/3.12.3/rpc-transport/socket.so<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#6 0x00007f182df925d4 in event_dispatch_epoll_handler () from /lib64/libglusterfs.so.0<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#7 0x00007f182df928ab in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#8 0x00007f182ccde5da in start_thread () from /lib64/libpthread.so.0<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#9 0x00007f182c5b4e8f in clone () from /lib64/libc.so.6<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">(gdb) thread 9<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">[Switching to thread 9 (Thread 0x7f18266c1700 (LWP 2419))]<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#0 0x00007f182cce787c in __lll_lock_wait () from /lib64/libpthread.so.0<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">(gdb) bt<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#0 0x00007f182cce787c in __lll_lock_wait () from /lib64/libpthread.so.0<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#1 0x00007f182cce2b42 in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#2 0x00007f182cce44c8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#3 0x00007f1827ccadab in socket_event_poll_err () from /usr/lib64/glusterfs/3.12.3/rpc-transport/socket.so<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#4 0x00007f1827ccf99c in socket_event_handler () from /usr/lib64/glusterfs/3.12.3/rpc-transport/socket.so<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#5 0x00007f182df925d4 in event_dispatch_epoll_handler () from /lib64/libglusterfs.so.0<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#6 0x00007f182df928ab in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#7 0x00007f182ccde5da in start_thread () from /lib64/libpthread.so.0<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">#8 0x00007f182c5b4e8f in clone () from /lib64/libc.so.6<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:7.5pt">Have you ever encounter this issue? From the gdb info it seem the epoll thread get dead lock.<o:p></o:p></span></p>
</div>
</body>
</html>