[Gluster-users] Odd "Transport endpoint is not connected" when trying to gunzip a file

Pat Haley phaley at mit.edu
Tue Jun 21 21:49:01 UTC 2022


Hi Strahil

I have tried a couple of tests of trying to gunzip the file with top 
running on the client (mseas) and on the brick server (mseas-data3) and 
with iotop running on the client (mseas).  I was not able to install 
iotop on the brick server yet (the external line is down).  I'll repeat 
when I fix that problem

I now can get one of two error messages when gunzip fails:

  * gzip:
    /projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz:
    File descriptor in bad state
      o a new error message
  * gzip:
    /projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz:
    Transport endpoint is not connected
      o the original error message

What I observed while waiting for gunzip to fail

  * top
      o no significant load (usually less than 0.1) on both machines.
      o zero IO-wait on both machines
  * iotop (only running on the client)
      o nothing related to gluster showing up in the display at all

I include below what I found in the log files again corresponding to 
these tests (and what I see in dmesg on the brick-server related to 
gluster, nothing showed up on the client)

Please let me know what I should try next.

Thanks

Pat


------------------------------------------
mseas-data3: dmesg | grep glust
------------------------------------------
many repeats of the following pairs of lines:

glusterfsd: page allocation failure. order:1, mode:0x20
Pid: 14245, comm: glusterfsd Not tainted 2.6.32-754.2.1.el6.x86_64 #1

------------------------------------------
mseas:messages
------------------------------------------
Jun 21 17:04:35 mseas gdata[155485]: [2022-06-21 21:04:35.638810] C 
[rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 
0-data-volume-client-2: server 172.16.1.113:49153 has not responded in 
the last 42 seconds, disconnecting.

Jun 21 17:21:04 mseas gdata[155485]: [2022-06-21 21:21:04.786083] C 
[rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 
0-data-volume-client-2: server 172.16.1.113:49153 has not responded in 
the last 42 seconds, disconnecting.

------------------------------------------
mseas:gdata.log
------------------------------------------
[2022-06-21 21:04:35.638810] C 
[rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 
0-data-volume-client-2: server 172.16.1.113:49153 has not responded in 
the last 42 seconds, disconnecting.
[2022-06-21 21:04:35.639261] E [rpc-clnt.c:362:saved_frames_unwind] (--> 
/usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x172)[0x7f84886a0202] 
(--> 
/usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1c2)[0x7f848846c3e2] 
(--> 
/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f848846c4de] 
(--> 
/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f848846dd2a] 
(--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f848846e538] 
))))) 0-data-volume-client-2: forced unwinding frame type(GlusterFS 3.3) 
op(READ(12)) called at 2022-06-21 21:03:29.735807 (xid=0xc05d54)
[2022-06-21 21:04:35.639494] E [rpc-clnt.c:362:saved_frames_unwind] (--> 
/usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x172)[0x7f84886a0202] 
(--> 
/usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1c2)[0x7f848846c3e2] 
(--> 
/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f848846c4de] 
(--> 
/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f848846dd2a] 
(--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f848846e538] 
))))) 0-data-volume-client-2: forced unwinding frame type(GF-DUMP) 
op(NULL(2)) called at 2022-06-21 21:03:53.633472 (xid=0xc05d55)


[2022-06-21 21:21:04.786083] C 
[rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 
0-data-volume-client-2: server 172.16.1.113:49153 has not responded in 
the last 42 seconds, disconnecting.
[2022-06-21 21:21:04.786732] E [rpc-clnt.c:362:saved_frames_unwind] (--> 
/usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x172)[0x7f84886a0202] 
(--> 
/usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1c2)[0x7f848846c3e2] 
(--> 
/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f848846c4de] 
(--> 
/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f848846dd2a] 
(--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f848846e538] 
))))) 0-data-volume-client-2: forced unwinding frame type(GlusterFS 3.3) 
op(READ(12)) called at 2022-06-21 21:19:52.634383 (xid=0xc05e31)
[2022-06-21 21:21:04.787172] E [rpc-clnt.c:362:saved_frames_unwind] (--> 
/usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x172)[0x7f84886a0202] 
(--> 
/usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1c2)[0x7f848846c3e2] 
(--> 
/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f848846c4de] 
(--> 
/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f848846dd2a] 
(--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f848846e538] 
))))) 0-data-volume-client-2: forced unwinding frame type(GF-DUMP) 
op(NULL(2)) called at 2022-06-21 21:20:22.780023 (xid=0xc05e32)

------------------------------------------
mseas-data3: bricks/export-sda-brick3.log
------------------------------------------
[2022-06-21 21:03:54.489638] I [MSGID: 115036] 
[server.c:552:server_rpc_notify] 0-data-volume-server: disconnecting 
connection from 
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-31
[2022-06-21 21:03:54.489752] I [MSGID: 115013] 
[server-helpers.c:294:do_fd_cleanup] 0-data-volume-server: fd cleanup on 
/projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz
[2022-06-21 21:03:54.489817] I [MSGID: 101055] 
[client_t.c:420:gf_client_unref] 0-data-volume-server: Shutting down 
connection 
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-31
[2022-06-21 21:04:04.506544] I [MSGID: 115029] 
[server-handshake.c:690:server_setvolume] 0-data-volume-server: accepted 
client from 
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-32 
(version: 3.7.11)


[2022-06-21 21:20:23.625096] I [MSGID: 115036] 
[server.c:552:server_rpc_notify] 0-data-volume-server: disconnecting 
connection from 
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-32
[2022-06-21 21:20:23.625189] I [MSGID: 115013] 
[server-helpers.c:294:do_fd_cleanup] 0-data-volume-server: fd cleanup on 
/projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz
[2022-06-21 21:20:23.625255] I [MSGID: 101055] 
[client_t.c:420:gf_client_unref] 0-data-volume-server: Shutting down 
connection 
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-32
[2022-06-21 21:20:23.641462] I [MSGID: 115029] 
[server-handshake.c:690:server_setvolume] 0-data-volume-server: accepted 
client from 
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-33 
(version: 3.7.11)

On 6/17/22 2:18 AM, Strahil Nikolov wrote:
> Check with top & iotop the load.
> Especially check the wait for I/O in top.
>
> Did you check dmesg for any clues ?
>
> Best Regards,
> Strahil Nikolov
>
>     On Thu, Jun 16, 2022 at 22:59, Pat Haley
>     <phaley at mit.edu> wrote:
>
>
>     Hi Strahil,
>
>     I poked around our logs, and found this on the front-end (from the
>     day & time of the last time we had the issue)
>
>
>     Jun 15 10:51:17 mseas gdata[155485]: [2022-06-15 14:51:17.263858]
>     C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired]
>     0-data-volume-client-2: server 172.16.1.113:49153 has not
>     responded in the last 42 seconds, disconnecting.
>
>
>     This would indicate that the problem is related.  For us, however,
>     I believe we can reproduce this issue at will (i.e. simply try to
>     gunzip the same file). Unfortunately I have to go to a meeting
>     now, but if you have some specific tests you'd like me to try, I
>     can try them when I get back.
>
>     Thanks
>
>     Pat
>
>
>
>     On 6/16/22 3:07 PM, Strahil Nikolov wrote:
>     Pat,
>
>     Can you check the cpu and disk  performance when the volume
>     reports the issue?
>
>
>     It seems that similar issue was reported in
>     https://lists.gluster.org/pipermail/gluster-users/2019-March/035944.html
>     <https://lists.gluster.org/pipermail/gluster-users/2019-March/035944.html>
>     but I don't see a clear solution.
>     Take a look in the thread and check if it matches your symptoms.
>
>
>     Best Regards,
>     Strahil Nikolov
>
>         On Thu, Jun 16, 2022 at 18:14, Pat Haley
>         <phaley at mit.edu> <mailto:phaley at mit.edu> wrote:
>
>
>         Hi Strahil,
>
>         I poked around again and for brick 3 (where the file we were
>         testing resides)  I only found the same log file as was at the
>         bottom of my first Email:
>
>
>         ---------------------------------------------------
>         mseas-data3:  bricks/export-sda-brick3.log
>         -----------------------------------------
>         [2022-06-15 14:50:42.588143] I [MSGID: 115036]
>         [server.c:552:server_rpc_notify] 0-data-volume-server:
>         disconnecting connection from
>         mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-28
>         [2022-06-15 14:50:42.588220] I [MSGID: 115013]
>         [server-helpers.c:294:do_fd_cleanup] 0-data-volume-server: fd
>         cleanup on
>         /projects/posydon/Acoustics_ASA/MSEAS-ParEq-DO/Save/2D/Test_Cases/RI/DO_NAPE_JASA_Paper/Uncertain_Pekeris_Waveguide_DO_MC
>         [2022-06-15 14:50:42.588259] I [MSGID: 115013]
>         [server-helpers.c:294:do_fd_cleanup] 0-data-volume-server: fd
>         cleanup on
>         /projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz
>         [2022-06-15 14:50:42.588288] I [MSGID: 101055]
>         [client_t.c:420:gf_client_unref] 0-data-volume-server:
>         Shutting down connection
>         mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-28
>         [2022-06-15 14:50:53.605215] I [MSGID: 115029]
>         [server-handshake.c:690:server_setvolume]
>         0-data-volume-server: accepted client from
>         mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-29
>         (version: 3.7.11)
>         [2022-06-15 14:50:42.588247] I [MSGID: 115013]
>         [server-helpers.c:294:do_fd_cleanup] 0-data-volume-server: fd
>         cleanup on
>         /projects/posydon/Acoustics_ASA/MSEAS-ParEq-DO/Save/2D/Test_Cases/RI/DO_NAPE_JASA_Paper/Uncertain_Pekeris_Waveguide_DO_MC
>
>         Thanks
>
>         Pat
>
>
>         On 6/15/22 6:47 PM, Strahil Nikolov wrote:
>         I agree. It will be very hard to debug.
>
>         Anything in the brick logs ?
>
>         I think it's pointless to mention that EL6 is dead and Gluster
>         v3 is so old that it's worth considering a migration to a
>         newer setup.
>
>         Best Regards,
>         Strahil Nikolov
>
>             On Wed, Jun 15, 2022 at 22:51, Yaniv Kaul
>             <ykaul at redhat.com> <mailto:ykaul at redhat.com> wrote:
>             ________
>
>
>
>             Community Meeting Calendar:
>
>             Schedule -
>             Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>             Bridge: https://meet.google.com/cpu-eiue-hvk
>             <https://meet.google.com/cpu-eiue-hvk>
>             Gluster-users mailing list
>             Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>             https://lists.gluster.org/mailman/listinfo/gluster-users
>             <https://lists.gluster.org/mailman/listinfo/gluster-users>
>
>         -- 
>
>         -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>         Pat Haley                          Email:phaley at mit.edu  <mailto:phaley at mit.edu>
>         Center for Ocean Engineering       Phone:  (617) 253-6824
>         Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>         MIT, Room 5-213http://web.mit.edu/phaley/www/  <http://web.mit.edu/phaley/www/>
>         77 Massachusetts Avenue
>         Cambridge, MA  02139-4301
>
>     -- 
>
>     -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>     Pat Haley                          Email:phaley at mit.edu  <mailto:phaley at mit.edu>
>     Center for Ocean Engineering       Phone:  (617) 253-6824
>     Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>     MIT, Room 5-213http://web.mit.edu/phaley/www/  <http://web.mit.edu/phaley/www/>
>     77 Massachusetts Avenue
>     Cambridge, MA  02139-4301
>
-- 

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:phaley at mit.edu
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-213http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220621/908317e4/attachment.html>


More information about the Gluster-users mailing list