<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">I apologise for this being posted twice - I'm not sure if that was user error or a bug in the mailing list, but the list wasn't showing my post after quite some time so I sent a second email which near immediately showed up - that's mailing lists I guess...<div class=""><br class=""></div><div class="">Anyway, if anyone has any input, advice or abuse I'm welcome any input!<br class=""><div class="">
<div style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;"><br class="">--<br class="">Sam McLeod<br class=""><a href="https://smcleod.net" class="">https://smcleod.net</a><br class="">https://twitter.com/s_mcleod</div>
</div>
<div><br class=""><blockquote type="cite" class=""><div class="">On 3 Sep 2018, at 1:20 pm, Sam McLeod <<a href="mailto:mailinglists@smcleod.net" class="">mailinglists@smcleod.net</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=us-ascii" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class="" style="font-family: Helvetica-Light;"><div class=""><span class="" style="font-size: 12px;">We've got an odd problem where clients are blocked from writing to Gluster volumes until the first node of the Gluster cluster is rebooted.</span></div><div class=""><span class="" style="font-size: 12px;"><br class=""></span></div><div class=""><span class="" style="font-size: 12px;">I suspect I've either configured something incorrectly with the arbiter / replica configuration of the volumes, or there is some sort of bug in the gluster client-server connection that we're triggering.</span></div><div class=""><span class="" style="font-size: 12px;"><br class=""></span></div></div><div class="" style="font-family: Helvetica-Light;"><div class=""><span class="" style="font-size: 12px;">I was wondering if anyone has seen this or could point me in the right direction?</span></div></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><b class="" style="font-size: 12px;">Environment:</b></div><ul class="" style="font-family: Helvetica-Light;"><li class=""><span class="" style="font-size: 12px;">Typology: 3 node cluster, replica 2, arbiter 1 (third node is metadata only).</span></li><li class=""><span class="" style="font-size: 12px;">Version: Client and Servers both running 4.1.3, both on CentOS 7, kernel 4.18.x, (Xen) VMs with relatively fast networked SSD storage backing them, XFS.</span></li><li class=""><span class="" style="font-size: 12px;">Client: Native Gluster FUSE client mounting via the kubernetes provider</span></li></ul><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><b class="" style="font-size: 12px;">Problem:</b></div><div class="" style="font-family: Helvetica-Light;"><ul class=""><li class=""><span class="" style="font-size: 12px;">Seemingly randomly some clients will be blocked / are unable to write to what should be a highly available gluster volume.</span></li><li class=""><span class="" style="font-size: 12px;">The client gluster logs show it failing to do new file operations across various volumes and all three nodes of the gluster.</span></li><li class=""><span class="" style="font-size: 12px;">The server gluster (or OS) logs do not show any warnings or errors.</span></li><li class=""><span class="" style="font-size: 12px;">The client recovers and is able to write to volumes again after the first node of the gluster cluster is rebooted.</span></li><li class=""><span class="" style="font-size: 12px;">Until the first node of the gluster cluster is rebooted, the client fails to write to the volume that is (or should be) available on the second node (a replica) and third node (an arbiter only node).</span></li></ul></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><b class="" style="font-size: 12px;">What 'fixes' the issue:</b></div><div class="" style="font-family: Helvetica-Light;"><ul class=""><li class=""><span class="" style="font-size: 12px;">Although the clients (kubernetes hosts) connect to all 3 nodes of the Gluster cluster - restarting the first gluster node always unblocks the IO and allows the client to continue writing.</span></li><li class=""><span class="" style="font-size: 12px;">Stopping and starting the glusterd service on the gluster server is not enough to fix the issue, nor is restarting its networking.</span></li><li class=""><span class="" style="font-size: 12px;">This suggests to me that the volume is unavailable for writing for some reason and restarting the first node in the cluster either clears some sort of TCP sessions between the client-server or between the server-server replication.</span></li></ul></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><b class="" style="font-size: 12px;">Expected behaviour:</b></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><ul class="MailOutline"><li class=""><span class="" style="font-size: 12px;">If the first gluster node / server had failed or was blocked from performing operations for some reason (which it doesn't seem it is), I'd expect the clients to access data from the second gluster node and write metadata to the third gluster node as well as it's an arbiter / metadata only node.</span></li><li class=""><span class="" style="font-size: 12px;">If for some reason the a gluster node was not able to serve connections to clients, I'd expect to see errors in the volume, glusterd or brick log files (there are none on the first gluster node).</span></li><li class=""><span class="" style="font-size: 12px;">If the first gluster node was for some reason blocking IO on a volume, I'd expect that node either to show as unhealthy or unavailable in the gluster peer status or gluster volume status.</span></li></ul></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><b class="" style="font-size: 12px;">Client gluster errors:</b></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><ul class="MailOutline"><li class=""><span class="" style="font-size: 12px;">staging_static in this example is a volume name.</span></li><li class=""><span class="" style="font-size: 12px;">You can see the client trying to connect to the second and third nodes of the gluster cluster and failing (unsure as to why?)</span></li><li class=""><span class="" style="font-size: 12px;">The server side logs on the first gluster node do not show any errors or problems, but the second / third node show errors in the glusterd.log when trying to 'unlock' the 0-management volume on the first node.</span></li></ul></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><b class="">On a gluster client</b> (a kubernetes host using the kubernetes connector which uses the native fuse client) when its blocked from writing but the gluster appears healthy (other than the errors mentioned later):</span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 15:33:22.750874] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x1cce sent = 2018-09-02 15:03:22.417773. timeout = 1800 for <ip of third gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 15:33:22.750989] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 16:03:23.097905] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x2e21 sent = 2018-09-02 15:33:22.765751. timeout = 1800 for <ip of second gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 16:03:23.097988] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 16:33:23.439172] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x1d4b sent = 2018-09-02 16:03:23.098133. timeout = 1800 for <ip of third gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 16:33:23.439282] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 17:03:23.786858] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x2ee7 sent = 2018-09-02 16:33:23.455171. timeout = 1800 for <ip of second gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 17:03:23.786971] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 17:33:24.160607] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x1dc8 sent = 2018-09-02 17:03:23.787120. timeout = 1800 for <ip of third gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 17:33:24.160720] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 18:03:24.505092] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x2faf sent = 2018-09-02 17:33:24.173153. timeout = 1800 for <ip of second gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 18:03:24.505185] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 18:33:24.841248] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x1e45 sent = 2018-09-02 18:03:24.505328. timeout = 1800 for <ip of third gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 18:33:24.841311] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 19:03:25.204711] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x3074 sent = 2018-09-02 18:33:24.855372. timeout = 1800 for <ip of second gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 19:03:25.204784] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 19:33:25.533545] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x1ec2 sent = 2018-09-02 19:03:25.204977. timeout = 1800 for <ip of third gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 19:33:25.533611] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 20:03:25.877020] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x3138 sent = 2018-09-02 19:33:25.545921. timeout = 1800 for <ip of second gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 20:03:25.877098] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 20:33:26.217858] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x1f3e sent = 2018-09-02 20:03:25.877264. timeout = 1800 for <ip of third gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 20:33:26.217973] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 21:03:26.588237] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x31ff sent = 2018-09-02 20:33:26.233010. timeout = 1800 for <ip of second gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 21:03:26.588316] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 21:33:26.912334] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x1fbb sent = 2018-09-02 21:03:26.588456. timeout = 1800 for <ip of third gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 21:33:26.912449] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 22:03:37.258915] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x32c5 sent = 2018-09-02 21:33:32.091009. timeout = 1800 for <ip of second gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 22:03:37.259000] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 22:33:37.615497] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x2039 sent = 2018-09-02 22:03:37.259147. timeout = 1800 for <ip of third gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 22:33:37.615574] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 23:03:37.940969] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x3386 sent = 2018-09-02 22:33:37.629655. timeout = 1800 for <ip of second gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 23:03:37.941049] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 23:33:38.270998] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x20b5 sent = 2018-09-02 23:03:37.941199. timeout = 1800 for <ip of third gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 23:33:38.271078] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-03 00:03:38.607186] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x3447 sent = 2018-09-02 23:33:38.285899. timeout = 1800 for <ip of second gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-03 00:03:38.607263] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-03 00:33:38.934385] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x2131 sent = 2018-09-03 00:03:38.607410. timeout = 1800 for <ip of third gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-03 00:33:38.934479] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-03 01:03:39.256842] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x350c sent = 2018-09-03 00:33:38.948570. timeout = 1800 for <ip of second gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-03 01:03:39.256972] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-03 01:33:39.614402] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x21ae sent = 2018-09-03 01:03:39.258166. timeout = 1800 for <ip of third gluster node>:49154</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-03 01:33:39.614483] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]</font></div></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><b class="" style="font-size: 12px;">On the second gluster server:</b></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;">We are seeing the following error in the glusterd.log file when the client is blocked from writing the volume, I think this is probably the most important information about the error and suggests a problem with the first node but doesn't explain the client behaviour:</span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 08:31:03.902272] E [MSGID: 106115] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on <FQDN of the first gluster node>. Please check log file for details.</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-02 08:31:03.902477] E [MSGID: 106151] [glusterd-syncop.c:1640:gd_unlock_op_phase] 0-management: Failed to unlock on some peer(s)</font></div></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;">Note in the above error:</span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;">1. I'm not sure which log to check (there doesn't seem to be a management brick / brick log)?</span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;">2. If there's a problem with the first node, why isn't it rejected from the gluster / taken offline / the health of the peers or volume list degraded?</span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;">3. Why does the client fail to write to the volume rather than (I'm assuming) trying the second (or third I guess) node to write to the volume?</span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;">We are also seeing the following errors repeated a lot in the logs, both when the volumes are working and when there's an issue in the brick log (/var/log/glusterfs/bricks/mnt-gluster-storage-staging_static.log):</span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-03 01:58:35.128923] E [server.c:137:server_submit_reply] (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x1fd14) [0x7f8470319d14] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x5f24a) [0x7f846bdde24a] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) [0x7f846bd89fce] ) 0-: Reply submission failed</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-03 01:58:35.128957] E [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3d60, Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 29) to rpc-transport (tcp.staging_static-server)</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-03 01:58:35.128983] E [server.c:137:server_submit_reply] (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x1fd14) [0x7f8470319d14] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x5f24a) [0x7f846bdde24a] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) [0x7f846bd89fce] ) 0-: Reply submission failed</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-03 01:58:35.129016] E [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3e2a, Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 29) to rpc-transport (tcp.staging_static-server)</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-03 01:58:35.129042] E [server.c:137:server_submit_reply] (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x1fd14) [0x7f8470319d14] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x5f24a) [0x7f846bdde24a] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) [0x7f846bd89fce] ) 0-: Reply submission failed</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-03 01:58:35.129077] E [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3ef6, Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 29) to rpc-transport (tcp.staging_static-server)</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-03 01:58:35.129149] E [server.c:137:server_submit_reply] (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x1fd14) [0x7f8470319d14] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x5f24a) [0x7f846bdde24a] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) [0x7f846bd89fce] ) 0-: Reply submission failed</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-03 01:58:35.129191] E [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3fc6, Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 29) to rpc-transport (tcp.staging_static-server)</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">[2018-09-03 01:58:35.129219] E [server.c:137:server_submit_reply] (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x1fd14) [0x7f8470319d14] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x5f24a) [0x7f846bdde24a] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) [0x7f846bd89fce] ) 0-: Reply submission failed</font></div></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><b class="" style="font-size: 12px;">Gluster volume information:</b></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;"># gluster volume info staging_static</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;"><br class=""></font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">Volume Name: staging_static</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">Type: Replicate</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">Volume ID: 7f3b8e91-afea-4fc6-be83-3399a089b6f3</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">Status: Started</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">Snapshot Count: 0</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">Number of Bricks: 1 x (2 + 1) = 3</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">Transport-type: tcp</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">Bricks:</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">Brick1: <first gluster node.fqdn>:/mnt/gluster-storage/staging_static</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">Brick2: <second gluster node.fqdn>:/mnt/gluster-storage/staging_static</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">Brick3: <third gluster node.fqdn>:/mnt/gluster-storage/staging_static (arbiter)</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">Options Reconfigured:</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">storage.fips-mode-rchecksum: true</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">cluster.self-heal-window-size: 16</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">cluster.shd-wait-qlength: 4096</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">cluster.shd-max-threads: 8</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">performance.cache-min-file-size: 2KB</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">performance.rda-cache-limit: 1GB</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">network.inode-lru-limit: 50000</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">server.outstanding-rpc-limit: 256</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">transport.listen-backlog: 2048</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">performance.write-behind-window-size: 512MB</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">performance.stat-prefetch: true</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;"><a href="http://performance.io/" class="">performance.io</a>-thread-count: 16</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">performance.client-io-threads: true</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">performance.cache-size: 1GB</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">performance.cache-refresh-timeout: 60</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">performance.cache-invalidation: true</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">cluster.use-compound-fops: true</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">cluster.readdir-optimize: true</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">cluster.lookup-optimize: true</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">cluster.favorite-child-policy: size</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">cluster.eager-lock: true</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">client.event-threads: 4</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">nfs.disable: on</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">transport.address-family: inet</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">diagnostics.brick-log-level: ERROR</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">diagnostics.client-log-level: ERROR</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">features.cache-invalidation-timeout: 300</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">features.cache-invalidation: true</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">network.ping-timeout: 15</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">performance.cache-max-file-size: 3MB</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">performance.md-cache-timeout: 300</font></div><div class=""><font face="IBMPlexMono" class="" style="font-size: 12px;">server.event-threads: 4</font></div></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;">Thanks in advance,</span></div><div class="" style="font-family: Helvetica-Light;"><span class="" style="font-size: 12px;"><br class=""></span><div class=""><div dir="auto" class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div style="font-family: Helvetica;" class=""><br class=""></div></div></div></div></div></div><div class="">
<div dir="auto" style="letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">--<br class="">Sam McLeod (protoporpoise on IRC)</div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><a href="https://smcleod.net/" class="">https://smcleod.net</a><br class=""><a href="https://twitter.com/s_mcleod" class="">https://twitter.com/s_mcleod</a><br class=""><br class="">Words are my own opinions and do not necessarily represent those of my employer or partners.</div></div></div></div>
</div>
<br class=""></div>_______________________________________________<br class="">Gluster-users mailing list<br class=""><a href="mailto:Gluster-users@gluster.org" class="">Gluster-users@gluster.org</a><br class="">https://lists.gluster.org/mailman/listinfo/gluster-users</div></blockquote></div><br class=""></div></div></body></html>