<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Mon, Sep 3, 2018 at 11:17 AM Karthik Subrahmanya <<a href="mailto:ksubrahm@redhat.com" target="_blank">ksubrahm@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hey,<div><br></div><div>We need some more information to debug this.</div><div>I think you missed to send the output of 'gluster volume info <volname>'.</div><div>Can you also provide the bricks, shd and glfsheal logs as well?</div><div>In the setup how many peers are present? You also mentioned that "one of the file servers have two processes for each of the volumes instead of one per volume", which process are you talking about here?</div></div></blockquote><div>Also provide the "ps aux | grep gluster" output.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><br></div><div>Regards,</div><div>Karthik</div></div><br><div class="gmail_quote"><div dir="ltr">On Sat, Sep 1, 2018 at 12:10 AM Johnson, Tim <<a href="mailto:tjj@uic.edu" target="_blank">tjj@uic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="EN-US" link="blue" vlink="purple">
<div class="m_-6740311702397631846m_-3056644397394569048m_5904757788766413815WordSection1">
<p class="MsoNormal">Thanks for the reply.<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal"> I have attached the gluster.log file from the host that it is happening to at this time.<u></u><u></u></p>
<p class="MsoNormal">It does change which host it does this on.<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">Thanks. <u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:12.0pt;color:black">From: </span></b><span style="font-size:12.0pt;color:black">Atin Mukherjee <<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>><br>
<b>Date: </b>Friday, August 31, 2018 at 1:03 PM<br>
<b>To: </b>"Johnson, Tim" <<a href="mailto:tjj@uic.edu" target="_blank">tjj@uic.edu</a>><br>
<b>Cc: </b>Karthik Subrahmanya <<a href="mailto:ksubrahm@redhat.com" target="_blank">ksubrahm@redhat.com</a>>, Ravishankar N <<a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>>, "<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>" <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br>
<b>Subject: </b>Re: [Gluster-users] Transport endpoint is not connected : issue<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<div>
<p class="MsoNormal">Can you please pass all the gluster log files from the server where the transport end point not connected error is reported? As restarting glusterd didn’t solve this issue, I believe this isn’t a stale port problem but something else. Also
please provide the output of ‘gluster v info <volname>’<u></u><u></u></p>
</div>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">(@cc Ravi, Karthik)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<p class="MsoNormal">On Fri, 31 Aug 2018 at 23:24, Johnson, Tim <<a href="mailto:tjj@uic.edu" target="_blank">tjj@uic.edu</a>> wrote:<u></u><u></u></p>
</div>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<div>
<p class="MsoNormal">Hello all,<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"> We have a gluster replicate (with arbiter) volumes that we are getting “Transport endpoint is not connected” with on a rotating basis from each of the two file servers,
and a third host that has the arbiter bricks on.<u></u><u></u></p>
<p class="MsoNormal">This is happening when trying to run a heal on all the volumes on the gluster hosts When I get the status of all the volumes all looks good.
<u></u><u></u></p>
<p class="MsoNormal"> This behavior seems to be a forshadowing of the gluster volumes becoming unresponsive to our vm cluster. As well as one of the file servers have two processes for each of
the volumes instead of one per volume. Eventually the affected file server<u></u><u></u></p>
<p class="MsoNormal">will drop off the listed peers. Restarting glusterd/glusterfsd on the affected file server does not take care of the issue, we have to bring down both file<u></u><u></u></p>
<p class="MsoNormal">Servers due to the volumes not being seen by the vm cluster after the errors start occurring. I had seen that there were bug reports about the “Transport endpoint is not connected”
on earlier versions of Gluster however had thought that<u></u><u></u></p>
<p class="MsoNormal">It had been addressed. <u></u><u></u></p>
<p class="MsoNormal"> Dmesg did have some entries for “a possible syn flood on port *” which we changed the sysctl to “net.ipv4.tcp_max_syn_backlog = 2048” which seemed to help the syn flood messages
but not the underlying volume issues.<u></u><u></u></p>
<p class="MsoNormal"> I have put the versions of all the Gluster packages installed below as well as the “Heal” and “Status” commands showing the volumes are
<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"> This has just started happening but cannot definitively say if this started occurring after an update or not.
<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">Thanks for any assistance.<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">Running Heal :<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">gluster volume heal ovirt_engine info<u></u><u></u></p>
<p class="MsoNormal">Brick ****1.rrc.local:/bricks/brick0/ovirt_engine<u></u><u></u></p>
<p class="MsoNormal">Status: Connected<u></u><u></u></p>
<p class="MsoNormal">Number of entries: 0<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">Brick ****3.rrc.local:/bricks/brick0/ovirt_engine<u></u><u></u></p>
<p class="MsoNormal">Status: Transport endpoint is not connected<u></u><u></u></p>
<p class="MsoNormal">Number of entries: -<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">Brick *****3.rrc.local:/bricks/arb-brick/ovirt_engine<u></u><u></u></p>
<p class="MsoNormal">Status: Transport endpoint is not connected<u></u><u></u></p>
<p class="MsoNormal">Number of entries: -
<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">Running status :<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">gluster volume status ovirt_engine<u></u><u></u></p>
<p class="MsoNormal">Status of volume: ovirt_engine<u></u><u></u></p>
<p class="MsoNormal">Gluster process TCP Port RDMA Port Online Pid<u></u><u></u></p>
<p class="MsoNormal">------------------------------------------------------------------------------<u></u><u></u></p>
<p class="MsoNormal">Brick*****.rrc.local:/bricks/brick0/ov<u></u><u></u></p>
<p class="MsoNormal">irt_engine 49152 0 Y 5521<u></u><u></u></p>
<p class="MsoNormal">Brick fs2-tier3.rrc.local:/bricks/brick0/ov<u></u><u></u></p>
<p class="MsoNormal">irt_engine 49152 0 Y 6245<u></u><u></u></p>
<p class="MsoNormal">Brick ****.rrc.local:/bricks/arb-b<u></u><u></u></p>
<p class="MsoNormal">rick/ovirt_engine 49152 0 Y 3526<u></u><u></u></p>
<p class="MsoNormal">Self-heal Daemon on localhost N/A N/A Y 5509<u></u><u></u></p>
<p class="MsoNormal">Self-heal Daemon on ***.rrc.local N/A N/A Y 6218<u></u><u></u></p>
<p class="MsoNormal">Self-heal Daemon on ***.rrc.local N/A N/A Y 3501<u></u><u></u></p>
<p class="MsoNormal">Self-heal Daemon on ****.rrc.local N/A N/A Y 3657<u></u><u></u></p>
<p class="MsoNormal">Self-heal Daemon on *****.rrc.local N/A N/A Y 3753<u></u><u></u></p>
<p class="MsoNormal">Self-heal Daemon on ****.rrc.local N/A N/A Y 17284<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">Task Status of Volume ovirt_engine<u></u><u></u></p>
<p class="MsoNormal">------------------------------------------------------------------------------<u></u><u></u></p>
<p class="MsoNormal">There are no active volume tasks<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">/etc/glusterd.vol. :<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">volume management<u></u><u></u></p>
<p class="MsoNormal"> type mgmt/glusterd<u></u><u></u></p>
<p class="MsoNormal"> option working-directory /var/lib/glusterd<u></u><u></u></p>
<p class="MsoNormal"> option transport-type socket,rdma<u></u><u></u></p>
<p class="MsoNormal"> option transport.socket.keepalive-time 10<u></u><u></u></p>
<p class="MsoNormal"> option transport.socket.keepalive-interval 2<u></u><u></u></p>
<p class="MsoNormal"> option transport.socket.read-fail-log off<u></u><u></u></p>
<p class="MsoNormal"> option ping-timeout 0<u></u><u></u></p>
<p class="MsoNormal"> option event-threads 1<u></u><u></u></p>
<p class="MsoNormal"> option rpc-auth-allow-insecure on<u></u><u></u></p>
<p class="MsoNormal"># option transport.address-family inet6<u></u><u></u></p>
<p class="MsoNormal"># option base-port 49152<u></u><u></u></p>
<p class="MsoNormal">end-volume<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">rpm -qa |grep gluster<u></u><u></u></p>
<p class="MsoNormal">glusterfs-3.12.13-1.el7.x86_64<u></u><u></u></p>
<p class="MsoNormal">glusterfs-gnfs-3.12.13-1.el7.x86_64<u></u><u></u></p>
<p class="MsoNormal">glusterfs-api-3.12.13-1.el7.x86_64<u></u><u></u></p>
<p class="MsoNormal">glusterfs-cli-3.12.13-1.el7.x86_64<u></u><u></u></p>
<p class="MsoNormal">glusterfs-client-xlators-3.12.13-1.el7.x86_64<u></u><u></u></p>
<p class="MsoNormal">glusterfs-fuse-3.12.13-1.el7.x86_64<u></u><u></u></p>
<p class="MsoNormal">centos-release-gluster312-1.0-2.el7.centos.noarch<u></u><u></u></p>
<p class="MsoNormal">glusterfs-rdma-3.12.13-1.el7.x86_64<u></u><u></u></p>
<p class="MsoNormal">glusterfs-libs-3.12.13-1.el7.x86_64<u></u><u></u></p>
<p class="MsoNormal">glusterfs-server-3.12.13-1.el7.x86_64<u></u><u></u></p>
</div>
</div>
<p class="MsoNormal">_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><u></u><u></u></p>
</blockquote>
</div>
</div>
<p class="MsoNormal">-- <u></u><u></u></p>
<div>
<p class="MsoNormal">- Atin (atinm)<u></u><u></u></p>
</div>
</div>
</div>
</blockquote></div>
</blockquote></div></div>