<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Nov 14, 2017 at 2:47 PM, Emmanuel Dreyfus <span dir="ltr">&lt;<a href="mailto:manu@netbsd.org" target="_blank">manu@netbsd.org</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Tue, Nov 14, 2017 at 12:17:05PM +0530, Atin Mukherjee wrote:<br>

&gt; &gt; gluster volume status also exhibits trouble: each server will only<br>

&gt; &gt; list its bricks, but not the other&#39;s one. I suspect it could just<br>

&gt; &gt; be some tiemout because of slow answer from the peer.<br>

<br>

&gt; Have you checked the output of gluster peer status? Also does glusterd log<br>

&gt; file give any hint on time outs, rpc failures, disconnections et all?<br>

<br>

</span>gluster peer status says &quot;State: Sent and Received peer request (Connected)&quot; <br></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

on both sides.<br></blockquote><div><br></div><div>So this is the origin of why the peers don&#39;t understand they are connected. Friend handshaking got stuck in the middle and it never recovered back. Restarting the glusterd services ideally should fix the state, if not then you&#39;d have to manually edit the /var/lib/glusterd/peers/UUID files with state=3 and then restart glusterd service.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

I have this in glusterd.log:<br>

[2017-11-14 08:49:47.289423] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_<wbr>registry_bind] 0-pmap: adding brick /export/wd3e on port 49155<br>

[2017-11-14 08:49:52.289926] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_<wbr>registry_bind] 0-pmap: adding brick /export/wd0e on port 49152<br>

[2017-11-14 08:49:52.295394] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_<wbr>registry_bind] 0-pmap: adding brick /export/wd1e on port 49153<br>

[2017-11-14 08:49:52.302973] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_<wbr>registry_bind] 0-pmap: adding brick /export/wd2e on port 49154<br>

[2017-11-14 08:54:31.535066] W [socket.c:593:__socket_rwv] 0-management: readv on <a href="http://192.0.2.109:24007" rel="noreferrer" target="_blank">192.0.2.109:24007</a> failed (Connection reset by peer)<br>

[2017-11-14 08:54:32.567745] I [MSGID: 106004] [glusterd-handler.c:6284:__<wbr>glusterd_peer_rpc_notify] 0-management: Peer &lt;bidon&gt; (&lt;2d7719d9-0466-434c-a881-<wbr>4081156fac47&gt;), in state &lt;Probe Sent to Peer&gt;, has disconnected from glusterd.<br>

<br>

An odd thing: the registrations message suggest the local bricks should<br>

show as online in glusterfs volume status output. They are displayed as<br>

offline, until I kill the glusterfsd processes and issue a<br>

 gluster volume start gfs force.<br>

<br>

ALong with symetrical stuff, the peer has this;<br>

[2017-11-14 08:56:05.799686] E [socket.c:2369:socket_connect_<wbr>finish] 0-management: connection to <a href="http://192.0.2.110:24007" rel="noreferrer" target="_blank">192.0.2.110:24007</a> failed (Connection timed out); disconnecting socket<br>

<br>

In the meantime I tracked the performance problem to exteded atributes<br>

system calls. The root of the problem is outside of glusterfs, but fixing<br>

the consequuences would be nice.<br>

<span class="HOEnZb"><font color="#888888"><br>

--<br>

Emmanuel Dreyfus<br>

<a href="mailto:manu@netbsd.org">manu@netbsd.org</a><br>

</font></span></blockquote></div><br></div></div>