<div dir="ltr">Normally client logs will give a clue on why the disconnections are happening (ping-timeout, wrong port etc). Can you look into client logs to figure out what&#39;s happening? If you can&#39;t find anything, can you send across client logs?<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Aug 29, 2018 at 6:11 PM, Richard Neuboeck <span dir="ltr">&lt;<a href="mailto:hawk@tbi.univie.ac.at" target="_blank">hawk@tbi.univie.ac.at</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Gluster Community,<br>

<br>

I have problems with a glusterfs &#39;Transport endpoint not connected&#39;<br>

connection abort during file transfers that I can replicate (all the<br>

time now) but not pinpoint as to why this is happening.<br>

<br>

The volume is set up in replica 3 mode and accessed with the fuse<br>

gluster client. Both client and server are running CentOS and the<br>

supplied 3.12.11 version of gluster.<br>

<br>

The connection abort happens at different times during rsync but<br>

occurs every time I try to sync all our files (1.1TB) to the empty<br>

volume.<br>

<br>

Client and server side I don&#39;t find errors in the gluster log files.<br>

rsync logs the obvious transfer problem. The only log that shows<br>

anything related is the server brick log which states that the<br>

connection is shutting down:<br>

<br>

[2018-08-18 22:40:35.502510] I [MSGID: 115036]<br>

[server.c:527:server_rpc_<wbr>notify] 0-home-server: disconnecting<br>

connection from brax-110405-2018/08/16-08:36:<wbr>28:575972-home-client-0-0-0<br>

[2018-08-18 22:40:35.502620] W<br>

[inodelk.c:499:pl_inodelk_log_<wbr>cleanup] 0-home-server: releasing lock<br>

on eaeb0398-fefd-486d-84a7-<wbr>f13744d1cf10 held by<br>

{client=0x7f83ec0b3ce0, pid=110423 lk-owner=d0fd5ffb427f0000}<br>

[2018-08-18 22:40:35.502692] W<br>

[entrylk.c:864:pl_entrylk_log_<wbr>cleanup] 0-home-server: releasing lock<br>

on faa93f7b-6c46-4251-b2b2-<wbr>abcd2f2613e1 held by<br>

{client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}<br>

[2018-08-18 22:40:35.502719] W<br>

[entrylk.c:864:pl_entrylk_log_<wbr>cleanup] 0-home-server: releasing lock<br>

on faa93f7b-6c46-4251-b2b2-<wbr>abcd2f2613e1 held by<br>

{client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}<br>

[2018-08-18 22:40:35.505950] I [MSGID: 101055]<br>

[client_t.c:443:gf_client_<wbr>unref] 0-home-server: Shutting down<br>

connection brax-110405-2018/08/16-08:36:<wbr>28:575972-home-client-0-0-0<br>

<br>

Since I&#39;m running another replica 3 setup for oVirt for a long time<br>

now which is completely stable I thought I made a mistake setting<br>

different options at first. However even when I reset those options<br>

I&#39;m able to reproduce the connection problem.<br>

<br>

The unoptimized volume setup looks like this:<br>

<br>

Volume Name: home<br>

Type: Replicate<br>

Volume ID: c92fa4cc-4a26-41ff-8c70-<wbr>1dd07f733ac8<br>

Status: Started<br>

Snapshot Count: 0<br>

Number of Bricks: 1 x 3 = 3<br>

Transport-type: tcp<br>

Bricks:<br>

Brick1: sphere-four:/srv/gluster_home/<wbr>brick<br>

Brick2: sphere-five:/srv/gluster_home/<wbr>brick<br>

Brick3: sphere-six:/srv/gluster_home/<wbr>brick<br>

Options Reconfigured:<br>

nfs.disable: on<br>

transport.address-family: inet<br>

cluster.quorum-type: auto<br>

cluster.server-quorum-type: server<br>

cluster.server-quorum-ratio: 50%<br>

<br>

<br>

The following additional options were used before:<br>

<br>

performance.cache-size: 5GB<br>

client.event-threads: 4<br>

server.event-threads: 4<br>

cluster.lookup-optimize: on<br>

features.cache-invalidation: on<br>

performance.stat-prefetch: on<br>

performance.cache-<wbr>invalidation: on<br>

network.inode-lru-limit: 50000<br>

features.cache-invalidation-<wbr>timeout: 600<br>

performance.md-cache-timeout: 600<br>

performance.parallel-readdir: on<br>

<br>

<br>

In this case the gluster servers and also the client is using a<br>

bonded network device running in adaptive load balancing mode.<br>

<br>

I&#39;ve tried using the debug option for the client mount. But except<br>

for a ~0.5TB log file I didn&#39;t get information that seems helpful to me.<br>

<br>

Transferring just a couple of GB works without problems.<br>

<br>

It may very well be that I&#39;m already blind to the obvious but after<br>

many long running tests I can&#39;t find the crux in the setup.<br>

<br>

Does anyone have an idea as how to approach this problem in a way<br>

that sheds some useful information?<br>

<br>

Any help is highly appreciated!<br>

Cheers<br>

<span class="HOEnZb"><font color="#888888">Richard<br>

<br>

-- <br>

/dev/null<br>

<br>

<br>

<br>

</font></span><br>______________________________<wbr>_________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br></blockquote></div><br></div>