<div>To be honest, I have no clue.</div><div>I would try restarting the gluster brick process (even stop + start the volume if that's an option) and reboot of the client.</div><div><br></div><div>If that doesn't help , you will have to plan updating your TSP to something newer (v9 or even v10).</div><div><br></div><div>Best Regards,</div><div>Strahil Nikolov <br> <br> <blockquote style="margin: 0 0 20px 0;"> <div style="font-family:Roboto, sans-serif; color:#6D00F6;"> <div>On Wed, Jun 22, 2022 at 0:49, Pat Haley</div><div><phaley@mit.edu> wrote:</div> </div> <div style="padding: 10px 0 0 20px; margin: 10px 0 0 0; border-left: 1px solid #6D00F6;"> <div id="yiv7740879906"><div>
<p><br clear="none">
</p>
<p>Hi Strahil</p>
<p>I have tried a couple of tests of trying to gunzip the file with
top running on the client (mseas) and on the brick server
(mseas-data3) and with iotop running on the client (mseas). I was
not able to install iotop on the brick server yet (the external
line is down). I'll repeat when I fix that problem</p>
<p>I now can get one of two error messages when gunzip fails:</p>
<ul><li>gzip:
/projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz:
File descriptor in bad state</li><ul><li>a new error message<br clear="none">
</li></ul><li>gzip:
/projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz:
Transport endpoint is not connected</li><ul><li>the original error message<br clear="none">
</li></ul></ul>
<p>What I observed while waiting for gunzip to fail</p>
<ul><li>top</li><ul><li>no significant load (usually less than 0.1) on both
machines.</li><li>zero IO-wait on both machines</li></ul><li>iotop (only running on the client)</li><ul><li>nothing related to gluster showing up in the display at all</li></ul></ul>
<p>I include below what I found in the log files again corresponding
to these tests (and what I see in dmesg on the brick-server
related to gluster, nothing showed up on the client)</p>
<p>Please let me know what I should try next.</p>
<p>Thanks</p>
<p>Pat<br clear="none">
</p>
<p><font face="monospace"><br clear="none">
------------------------------------------<br clear="none">
mseas-data3: dmesg | grep glust<br clear="none">
------------------------------------------<br clear="none">
many repeats of the following pairs of lines:<br clear="none">
<br clear="none">
glusterfsd: page allocation failure. order:1, mode:0x20<br clear="none">
Pid: 14245, comm: glusterfsd Not tainted
2.6.32-754.2.1.el6.x86_64 #1<br clear="none">
<br clear="none">
------------------------------------------<br clear="none">
mseas:messages<br clear="none">
------------------------------------------<br clear="none">
Jun 21 17:04:35 mseas gdata[155485]: [2022-06-21
21:04:35.638810] C
[rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired]
0-data-volume-client-2: server 172.16.1.113:49153 has not
responded in the last 42 seconds, disconnecting.<br clear="none">
<br clear="none">
Jun 21 17:21:04 mseas gdata[155485]: [2022-06-21
21:21:04.786083] C
[rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired]
0-data-volume-client-2: server 172.16.1.113:49153 has not
responded in the last 42 seconds, disconnecting.<br clear="none">
<br clear="none">
------------------------------------------<br clear="none">
mseas:gdata.log<br clear="none">
------------------------------------------<br clear="none">
[2022-06-21 21:04:35.638810] C
[rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired]
0-data-volume-client-2: server 172.16.1.113:49153 has not
responded in the last 42 seconds, disconnecting.<br clear="none">
[2022-06-21 21:04:35.639261] E
[rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x172)[0x7f84886a0202]
(-->
/usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1c2)[0x7f848846c3e2]
(-->
/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f848846c4de]
(-->
/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f848846dd2a]
(-->
/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f848846e538]
))))) 0-data-volume-client-2: forced unwinding frame
type(GlusterFS 3.3) op(READ(12)) called at 2022-06-21
21:03:29.735807 (xid=0xc05d54)<br clear="none">
[2022-06-21 21:04:35.639494] E
[rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x172)[0x7f84886a0202]
(-->
/usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1c2)[0x7f848846c3e2]
(-->
/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f848846c4de]
(-->
/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f848846dd2a]
(-->
/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f848846e538]
))))) 0-data-volume-client-2: forced unwinding frame
type(GF-DUMP) op(NULL(2)) called at 2022-06-21 21:03:53.633472
(xid=0xc05d55)<br clear="none">
<br clear="none">
<br clear="none">
[2022-06-21 21:21:04.786083] C
[rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired]
0-data-volume-client-2: server 172.16.1.113:49153 has not
responded in the last 42 seconds, disconnecting.<br clear="none">
[2022-06-21 21:21:04.786732] E
[rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x172)[0x7f84886a0202]
(-->
/usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1c2)[0x7f848846c3e2]
(-->
/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f848846c4de]
(-->
/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f848846dd2a]
(-->
/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f848846e538]
))))) 0-data-volume-client-2: forced unwinding frame
type(GlusterFS 3.3) op(READ(12)) called at 2022-06-21
21:19:52.634383 (xid=0xc05e31)<br clear="none">
[2022-06-21 21:21:04.787172] E
[rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x172)[0x7f84886a0202]
(-->
/usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1c2)[0x7f848846c3e2]
(-->
/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f848846c4de]
(-->
/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f848846dd2a]
(-->
/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f848846e538]
))))) 0-data-volume-client-2: forced unwinding frame
type(GF-DUMP) op(NULL(2)) called at 2022-06-21 21:20:22.780023
(xid=0xc05e32)<br clear="none">
<br clear="none">
------------------------------------------<br clear="none">
mseas-data3: bricks/export-sda-brick3.log<br clear="none">
------------------------------------------<br clear="none">
[2022-06-21 21:03:54.489638] I [MSGID: 115036]
[server.c:552:server_rpc_notify] 0-data-volume-server:
disconnecting connection from
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-31<br clear="none">
[2022-06-21 21:03:54.489752] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-data-volume-server: fd
cleanup on
/projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz<br clear="none">
[2022-06-21 21:03:54.489817] I [MSGID: 101055]
[client_t.c:420:gf_client_unref] 0-data-volume-server: Shutting
down connection
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-31<br clear="none">
[2022-06-21 21:04:04.506544] I [MSGID: 115029]
[server-handshake.c:690:server_setvolume] 0-data-volume-server:
accepted client from
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-32
(version: 3.7.11)<br clear="none">
<br clear="none">
<br clear="none">
[2022-06-21 21:20:23.625096] I [MSGID: 115036]
[server.c:552:server_rpc_notify] 0-data-volume-server:
disconnecting connection from
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-32<br clear="none">
[2022-06-21 21:20:23.625189] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-data-volume-server: fd
cleanup on
/projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz<br clear="none">
[2022-06-21 21:20:23.625255] I [MSGID: 101055]
[client_t.c:420:gf_client_unref] 0-data-volume-server: Shutting
down connection
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-32<br clear="none">
[2022-06-21 21:20:23.641462] I [MSGID: 115029]
[server-handshake.c:690:server_setvolume] 0-data-volume-server:
accepted client from
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-33
(version: 3.7.11)<br clear="none">
</font><br clear="none">
</p>
<div class="yiv7740879906moz-cite-prefix">On 6/17/22 2:18 AM, Strahil Nikolov
wrote:<br clear="none">
</div>
<blockquote type="cite">
</blockquote></div><div>
Check with top & iotop the load.
<div>Especially check the wait for I/O in top.</div>
<div><br clear="none">
</div>
<div>Did you check dmesg for any clues ?</div>
<div><br clear="none">
</div>
<div>Best Regards,</div>
<div>Strahil Nikolov<br clear="none">
<br clear="none">
<blockquote style="margin:0 0 20px 0;">
<div style="font-family:Roboto, sans-serif;color:#6D00F6;">
<div>On Thu, Jun 16, 2022 at 22:59, Pat Haley</div>
<div><a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:phaley@mit.edu" target="_blank" href="mailto:phaley@mit.edu" class="yiv7740879906moz-txt-link-rfc2396E"><phaley@mit.edu></a> wrote:</div>
</div>
<div style="padding:10px 0 0 20px;margin:10px 0 0 0;border-left:1px solid #6D00F6;">
<div id="yiv7740879906">
<div>
<p><br clear="none">
</p>
<p>Hi Strahil,</p>
<p>I poked around our logs, and found this on the
front-end (from the day & time of the last time we
had the issue)</p>
<p><br clear="none">
</p>
<p><font face="monospace">Jun 15 10:51:17 mseas
gdata[155485]: [2022-06-15 14:51:17.263858] C
[rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired]
0-data-volume-client-2: server 172.16.1.113:49153
has not responded in the last 42 seconds,
disconnecting.<br clear="none">
</font></p>
<p><br clear="none">
</p>
<p>This would indicate that the problem is related. For
us, however, I believe we can reproduce this issue at
will (i.e. simply try to gunzip the same file).
Unfortunately I have to go to a meeting now, but if
you have some specific tests you'd like me to try, I
can try them when I get back.</p>
<p>Thanks</p>
<p>Pat</p>
<p><br clear="none">
</p>
<p><br clear="none">
</p>
<div class="yiv7740879906moz-cite-prefix">On 6/16/22
3:07 PM, Strahil Nikolov wrote:<br clear="none">
</div>
<blockquote type="cite"> </blockquote>
</div>
<div> Pat,
<div><br clear="none">
</div>
<div>
<div>Can you check the cpu and disk performance when
the volume reports the issue?</div>
<div><br clear="none">
</div>
</div>
<div><br clear="none">
</div>
<div>It seems that similar issue was reported
in <a rel="nofollow noopener noreferrer" shape="rect" target="_blank" href="https://lists.gluster.org/pipermail/gluster-users/2019-March/035944.html" class="yiv7740879906moz-txt-link-freetext yiv7740879906moz-txt-link-freetext">https://lists.gluster.org/pipermail/gluster-users/2019-March/035944.html</a>
but I don't see a clear solution.</div>
<div>Take a look in the thread and check if it matches
your symptoms.</div>
<div><br clear="none">
</div>
<div><br clear="none">
</div>
<div>Best Regards,</div>
<div>Strahil Nikolov<br clear="none">
<br clear="none">
<blockquote style="margin:0 0 20px 0;">
<div style="font-family:Roboto, sans-serif;color:#6D00F6;">
<div>On Thu, Jun 16, 2022 at 18:14, Pat Haley</div>
<div><a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:phaley@mit.edu" target="_blank" href="mailto:phaley@mit.edu" class="yiv7740879906moz-txt-link-rfc2396E"><phaley@mit.edu></a>
wrote:</div>
</div>
<div style="padding:10px 0 0 20px;margin:10px 0 0 0;border-left:1px solid #6D00F6;">
<div id="yiv7740879906">
<div>
<p><br clear="none">
</p>
<p>Hi Strahil,</p>
<p>I poked around again and for brick 3 (where
the file we were testing resides) I only
found the same log file as was at the bottom
of my first Email:</p>
<p><br clear="none">
<font face="monospace">---------------------------------------------------<br clear="none">
mseas-data3: bricks/export-sda-brick3.log<br clear="none">
-----------------------------------------<br clear="none">
[2022-06-15 14:50:42.588143] I [MSGID:
115036] [server.c:552:server_rpc_notify]
0-data-volume-server: disconnecting
connection from
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-28<br clear="none">
[2022-06-15 14:50:42.588220] I [MSGID:
115013]
[server-helpers.c:294:do_fd_cleanup]
0-data-volume-server: fd cleanup on
/projects/posydon/Acoustics_ASA/MSEAS-ParEq-DO/Save/2D/Test_Cases/RI/DO_NAPE_JASA_Paper/Uncertain_Pekeris_Waveguide_DO_MC<br clear="none">
[2022-06-15 14:50:42.588259] I [MSGID:
115013]
[server-helpers.c:294:do_fd_cleanup]
0-data-volume-server: fd cleanup on
/projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz<br clear="none">
[2022-06-15 14:50:42.588288] I [MSGID:
101055] [client_t.c:420:gf_client_unref]
0-data-volume-server: Shutting down
connection
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-28<br clear="none">
[2022-06-15 14:50:53.605215] I [MSGID:
115029]
[server-handshake.c:690:server_setvolume]
0-data-volume-server: accepted client from
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-29
(version: 3.7.11)<br clear="none">
[2022-06-15 14:50:42.588247] I [MSGID:
115013]
[server-helpers.c:294:do_fd_cleanup]
0-data-volume-server: fd cleanup on
/projects/posydon/Acoustics_ASA/MSEAS-ParEq-DO/Save/2D/Test_Cases/RI/DO_NAPE_JASA_Paper/Uncertain_Pekeris_Waveguide_DO_MC</font><br clear="none">
</p>
<p>Thanks</p>
<p>Pat</p>
<p><br clear="none">
</p>
<div id="yiv7740879906yqt80756" class="yiv7740879906yqt6062330617">
<div class="yiv7740879906moz-cite-prefix">On
6/15/22 6:47 PM, Strahil Nikolov wrote:<br clear="none">
</div>
<blockquote type="cite"> </blockquote>
</div>
</div>
<div id="yiv7740879906yqt04765" class="yiv7740879906yqt6062330617">
<div>
<div id="yiv7740879906">
<div>I agree. It will be very hard to
debug.
<div><br clear="none">
</div>
<div>Anything in the brick logs ?</div>
<div><br clear="none">
</div>
<div>I think it's pointless to mention
that EL6 is dead and Gluster v3 is so
old that it's worth considering a
migration to a newer setup.</div>
<div><br clear="none">
</div>
<div>Best Regards,</div>
<div>Strahil Nikolov<br clear="none">
<br clear="none">
<blockquote style="margin:0 0 20px 0;">
<div style="font-family:Roboto, sans-serif;color:#6D00F6;">
<div id="yiv7740879906yqtfd58221" class="yiv7740879906yqt6679717770">
<div>On Wed, Jun 15, 2022 at
22:51, Yaniv Kaul</div>
<div><a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:ykaul@redhat.com" target="_blank" href="mailto:ykaul@redhat.com" class="yiv7740879906moz-txt-link-rfc2396E"><ykaul@redhat.com></a>
wrote:</div>
</div>
</div>
<div id="yiv7740879906yqtfd56990" class="yiv7740879906yqt6679717770">
<div style="padding:10px 0 0 20px;margin:10px 0 0 0;border-left:1px solid #6D00F6;"> ________<br clear="none">
<br clear="none">
<br clear="none">
<br clear="none">
Community Meeting Calendar:<br clear="none">
<br clear="none">
Schedule -<br clear="none">
Every 2nd and 4th Tuesday at
14:30 IST / 09:00 UTC<br clear="none">
Bridge: <a rel="nofollow noopener noreferrer" shape="rect" target="_blank" href="https://meet.google.com/cpu-eiue-hvk" class="yiv7740879906moz-txt-link-freetext yiv7740879906moz-txt-link-freetext">https://meet.google.com/cpu-eiue-hvk</a><br clear="none">
Gluster-users mailing list<br clear="none">
<a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:Gluster-users@gluster.org" target="_blank" href="mailto:Gluster-users@gluster.org" class="yiv7740879906moz-txt-link-freetext yiv7740879906moz-txt-link-freetext">Gluster-users@gluster.org</a><br clear="none">
<a rel="nofollow noopener noreferrer" shape="rect" target="_blank" href="https://lists.gluster.org/mailman/listinfo/gluster-users" class="yiv7740879906moz-txt-link-freetext yiv7740879906moz-txt-link-freetext">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br clear="none">
</div>
</div>
</blockquote>
</div>
</div>
</div>
<pre class="yiv7740879906moz-signature">--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: <a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:phaley@mit.edu" target="_blank" href="mailto:phaley@mit.edu" class="yiv7740879906moz-txt-link-abbreviated yiv7740879906moz-txt-link-freetext yiv7740879906moz-txt-link-freetext">phaley@mit.edu</a>
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 <a rel="nofollow noopener noreferrer" shape="rect" target="_blank" href="http://web.mit.edu/phaley/www/" class="yiv7740879906moz-txt-link-freetext yiv7740879906moz-txt-link-freetext">http://web.mit.edu/phaley/www/</a>
77 Massachusetts Avenue
Cambridge, MA 02139-4301
</pre>
</div><div id="yiv7740879906yqtfd85716" class="yiv7740879906yqt8174421451">
<div id="yiv7740879906yqtfd71224" class="yiv7740879906yqt0138793630"> </div>
</div></div><div id="yiv7740879906yqtfd45172" class="yiv7740879906yqt8174421451">
<div id="yiv7740879906yqtfd75825" class="yiv7740879906yqt0138793630"> </div>
</div></div><div id="yiv7740879906yqtfd35860" class="yiv7740879906yqt8174421451">
<div id="yiv7740879906yqtfd56693" class="yiv7740879906yqt0138793630"> </div>
</div></div><div id="yiv7740879906yqtfd23084" class="yiv7740879906yqt8174421451">
<div id="yiv7740879906yqtfd94279" class="yiv7740879906yqt0138793630"> </div>
</div></blockquote><div id="yiv7740879906yqtfd08403" class="yiv7740879906yqt8174421451">
<div id="yiv7740879906yqtfd14360" class="yiv7740879906yqt0138793630"> </div>
</div></div><div id="yiv7740879906yqtfd19797" class="yiv7740879906yqt8174421451">
<div id="yiv7740879906yqtfd34488" class="yiv7740879906yqt0138793630">
<pre class="yiv7740879906moz-signature">--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: <a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:phaley@mit.edu" target="_blank" href="mailto:phaley@mit.edu" class="yiv7740879906moz-txt-link-abbreviated yiv7740879906moz-txt-link-freetext">phaley@mit.edu</a>
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 <a rel="nofollow noopener noreferrer" shape="rect" target="_blank" href="http://web.mit.edu/phaley/www/" class="yiv7740879906moz-txt-link-freetext yiv7740879906moz-txt-link-freetext">http://web.mit.edu/phaley/www/</a>
77 Massachusetts Avenue
Cambridge, MA 02139-4301
</pre>
</div>
</div></div><div id="yiv7740879906yqtfd29790" class="yiv7740879906yqt8174421451">
</div></div><div id="yiv7740879906yqtfd13534" class="yiv7740879906yqt8174421451">
</div></div><div id="yiv7740879906yqtfd44327" class="yiv7740879906yqt8174421451">
</div></blockquote><div id="yiv7740879906yqtfd93698" class="yiv7740879906yqt8174421451">
</div></div><div id="yiv7740879906yqtfd46083" class="yiv7740879906yqt8174421451">
<pre class="yiv7740879906moz-signature">--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: <a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:phaley@mit.edu" target="_blank" href="mailto:phaley@mit.edu" class="yiv7740879906moz-txt-link-abbreviated">phaley@mit.edu</a>
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 <a rel="nofollow noopener noreferrer" shape="rect" target="_blank" href="http://web.mit.edu/phaley/www/" class="yiv7740879906moz-txt-link-freetext">http://web.mit.edu/phaley/www/</a>
77 Massachusetts Avenue
Cambridge, MA 02139-4301
</pre>
</div></div></div> </div> </blockquote></div>