<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jun 30, 2017 at 1:31 AM, Jan <span dir="ltr"><<a href="mailto:jan.h.zak@gmail.com" target="_blank">jan.h.zak@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi all,</div><div><br></div><div>Gluster and Ganesha are amazing. Thank you for this great work!</div><div><br></div><div>I’m struggling with one issue and I think that you might be able to help me.</div><div><br></div><div>I spent some time by playing with Gluster and Ganesha and after I gain some experience I decided that I should go into production but I’m still struggling with one issue.</div><div> </div><div>I have 3x node CentOS 7.3 with the most current Gluster and Ganesha from centos-gluster310 repository (3.10.2-1.el7) with replicated bricks.</div><div><br></div><div>Servers have a lot of resources and they run in a subnet on a stable network.</div><div><br></div><div>I didn’t have any issues when I tested a single brick. But now I’d like to setup 17 replicated bricks and I realized that when I restart one of nodes then the result looks like this:</div><div><br></div><div>sudo gluster volume status | grep ' N '</div><div><br></div><div>Brick glunode0:/st/brick3/dir N/A N/A N N/A </div><div>Brick glunode1:/st/brick2/dir N/A N/A N N/A </div><div><br></div><div>Some bricks just don’t go online. Sometime it’s one brick, sometime tree and it’s not same brick – it’s random issue.</div><div><br></div><div>I checked log on affected servers and this is an example:</div><div><br></div><div>sudo tail /var/log/glusterfs/bricks/st-<wbr>brick3-0.log </div><div><br></div><div>[2017-06-29 17:59:48.651581] W [socket.c:593:__socket_rwv] 0-glusterfs: readv on <a href="http://10.2.44.23:24007" target="_blank">10.2.44.23:24007</a> failed (No data available)</div><div>[2017-06-29 17:59:48.651622] E [glusterfsd-mgmt.c:2114:mgmt_<wbr>rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: glunode0 (No data available)</div><div>[2017-06-29 17:59:48.651638] I [glusterfsd-mgmt.c:2133:mgmt_<wbr>rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers</div><div>[2017-06-29 17:59:49.944103] W [glusterfsd.c:1332:cleanup_<wbr>and_exit] (-->/lib64/libpthread.so.0(+<wbr>0x7dc5) [0x7f3158032dc5] -->/usr/sbin/glusterfsd(<wbr>glusterfs_sigwaiter+0xe5) [0x7f31596cbfd5] -->/usr/sbin/glusterfsd(<wbr>cleanup_and_exit+0x6b) [0x7f31596cbdfb] ) 0-:received signum (15), shutting down</div><div>[2017-06-29 17:59:50.397107] E [socket.c:3203:socket_connect] 0-glusterfs: connection attempt on <a href="http://10.2.44.23:24007" target="_blank">10.2.44.23:24007</a> failed, (Network is unreachable)</div></div></blockquote><div><br></div><div>This happens when connect () syscall fails with ENETUNREACH errno as per the followint code<br><br> if (ign_enoent) { <br> ret = connect_loop (priv->sock, <br> SA (&this->peerinfo.sockaddr), <br> this->peerinfo.sockaddr_len); <br> } else { <br> ret = connect (priv->sock, <br> SA (&this->peerinfo.sockaddr), <br> this->peerinfo.sockaddr_len); <br> } <br> <br> if (ret == -1 && errno == ENOENT && ign_enoent) { <br> gf_log (this->name, GF_LOG_WARNING, <br> "Ignore failed connection attempt on %s, (%s) ", <br> this->peerinfo.identifier, strerror (errno)); <br> <br> /* connect failed with some other error than EINPROGRESS<br> so, getsockopt (... SO_ERROR ...), will not catch any <br> errors and return them to us, we need to remember this <br> state, and take actions in socket_event_handler <br> appropriately */ <br> /* TBD: What about ENOENT, we will do getsockopt there <br> as well, so how is that exempt from such a problem? */ <br> priv->connect_failed = 1; <br> this->connect_failed = _gf_true; <br> <br> goto handler; <br> } <br> <br> if (ret == -1 && ((errno != EINPROGRESS) && (errno != ENOENT))) {<br> /* For unix path based sockets, the socket path is <br> * cryptic (md5sum of path) and may not be useful for <br> * the user in debugging so log it in DEBUG <br> */ <br> gf_log (this->name, ((sa_family == AF_UNIX) ? <===== this is the log which gets generated <br> GF_LOG_DEBUG : GF_LOG_ERROR), <br> "connection attempt on %s failed, (%s)", <br> this->peerinfo.identifier, strerror (errno)); <br><br></div><div>IMO, this can only happen if there is an intermittent n/w failure? <br><br></div><div>@Raghavendra G/ Mohit - do you have any other opinion?<br></div><div><br>[2017-06-29 17:59:50.397138] I [socket.c:3507:socket_submit_<wbr>request] 0-glusterfs: not connected (priv->connected = 0)</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>[2017-06-29 17:59:50.397162] W [rpc-clnt.c:1693:rpc_clnt_<wbr>submit] 0-glusterfs: failed to submit rpc-request (XID: 0x3 Program: Gluster Portmap, ProgVers: 1, Proc: 5) to rpc-transport (glusterfs)</div><div><br></div><div>I think that important message is “Network is unreachable”.</div><div><br></div><div>Question</div><div>1. Could you please tell me, is that normal when you have many bricks? Networks is definitely stable and other servers use it without problem and all servers run on a same pair of switches. My assumption is that in the same time many bricks try to connect and that doesn’t work.</div><div><br></div><div>2. Is there an option to configure a brick to enable some kind of autoreconnect or add some timeout?</div><div>gluster volume set brick123 option456 abc ??</div><div><br></div><div>3. What it the recommend way to fix offline brick on the affected server? I don’t want to use “gluster volume stop/start” since affected bricks are online on other server and there is no reason to completely turn it off.</div><div><br></div><div>Thank you,</div><div>Jan</div></div>
<br>______________________________<wbr>_________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br></blockquote></div><br></div></div>