<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Nov 14, 2017 at 2:47 PM, Emmanuel Dreyfus <span dir="ltr"><<a href="mailto:manu@netbsd.org" target="_blank">manu@netbsd.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Tue, Nov 14, 2017 at 12:17:05PM +0530, Atin Mukherjee wrote:<br>
> > gluster volume status also exhibits trouble: each server will only<br>
> > list its bricks, but not the other's one. I suspect it could just<br>
> > be some tiemout because of slow answer from the peer.<br>
<br>
> Have you checked the output of gluster peer status? Also does glusterd log<br>
> file give any hint on time outs, rpc failures, disconnections et all?<br>
<br>
</span>gluster peer status says "State: Sent and Received peer request (Connected)" <br></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
on both sides.<br></blockquote><div><br></div><div>So this is the origin of why the peers don't understand they are connected. Friend handshaking got stuck in the middle and it never recovered back. Restarting the glusterd services ideally should fix the state, if not then you'd have to manually edit the /var/lib/glusterd/peers/UUID files with state=3 and then restart glusterd service.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
I have this in glusterd.log:<br>
[2017-11-14 08:49:47.289423] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_<wbr>registry_bind] 0-pmap: adding brick /export/wd3e on port 49155<br>
[2017-11-14 08:49:52.289926] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_<wbr>registry_bind] 0-pmap: adding brick /export/wd0e on port 49152<br>
[2017-11-14 08:49:52.295394] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_<wbr>registry_bind] 0-pmap: adding brick /export/wd1e on port 49153<br>
[2017-11-14 08:49:52.302973] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_<wbr>registry_bind] 0-pmap: adding brick /export/wd2e on port 49154<br>
[2017-11-14 08:54:31.535066] W [socket.c:593:__socket_rwv] 0-management: readv on <a href="http://192.0.2.109:24007" rel="noreferrer" target="_blank">192.0.2.109:24007</a> failed (Connection reset by peer)<br>
[2017-11-14 08:54:32.567745] I [MSGID: 106004] [glusterd-handler.c:6284:__<wbr>glusterd_peer_rpc_notify] 0-management: Peer <bidon> (<2d7719d9-0466-434c-a881-<wbr>4081156fac47>), in state <Probe Sent to Peer>, has disconnected from glusterd.<br>
<br>
An odd thing: the registrations message suggest the local bricks should<br>
show as online in glusterfs volume status output. They are displayed as<br>
offline, until I kill the glusterfsd processes and issue a<br>
gluster volume start gfs force.<br>
<br>
ALong with symetrical stuff, the peer has this;<br>
[2017-11-14 08:56:05.799686] E [socket.c:2369:socket_connect_<wbr>finish] 0-management: connection to <a href="http://192.0.2.110:24007" rel="noreferrer" target="_blank">192.0.2.110:24007</a> failed (Connection timed out); disconnecting socket<br>
<br>
In the meantime I tracked the performance problem to exteded atributes<br>
system calls. The root of the problem is outside of glusterfs, but fixing<br>
the consequuences would be nice.<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Emmanuel Dreyfus<br>
<a href="mailto:manu@netbsd.org">manu@netbsd.org</a><br>
</font></span></blockquote></div><br></div></div>