<div dir="ltr"><div>Hi all,</div><div><br></div><div>Gluster and Ganesha are amazing. Thank you for this great work!</div><div><br></div><div>I’m struggling with one issue and I think that you might be able to help me.</div><div><br></div><div>I spent some time by playing with Gluster and Ganesha and after I gain some experience I decided that I should go into production but I’m still struggling with one issue.</div><div> </div><div>I have 3x node CentOS 7.3 with the most current Gluster and Ganesha from centos-gluster310 repository (3.10.2-1.el7) with replicated bricks.</div><div><br></div><div>Servers have a lot of resources and they run in a subnet on a stable network.</div><div><br></div><div>I didn’t have any issues when I tested a single brick. But now I’d like to setup 17 replicated bricks and I realized that when I restart one of nodes then the result looks like this:</div><div><br></div><div>sudo gluster volume status | grep &#39; N &#39;</div><div><br></div><div>Brick glunode0:/st/brick3/dir          N/A       N/A        N       N/A  </div><div>Brick glunode1:/st/brick2/dir          N/A       N/A        N       N/A  </div><div><br></div><div>Some bricks just don’t go online. Sometime it’s one brick, sometime tree and it’s not same brick – it’s random issue.</div><div><br></div><div>I checked log on affected servers and this is an example:</div><div><br></div><div>sudo tail /var/log/glusterfs/bricks/st-brick3-0.log </div><div><br></div><div>[2017-06-29 17:59:48.651581] W [socket.c:593:__socket_rwv] 0-glusterfs: readv on <a href="http://10.2.44.23:24007">10.2.44.23:24007</a> failed (No data available)</div><div>[2017-06-29 17:59:48.651622] E [glusterfsd-mgmt.c:2114:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: glunode0 (No data available)</div><div>[2017-06-29 17:59:48.651638] I [glusterfsd-mgmt.c:2133:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers</div><div>[2017-06-29 17:59:49.944103] W [glusterfsd.c:1332:cleanup_and_exit] (--&gt;/lib64/libpthread.so.0(+0x7dc5) [0x7f3158032dc5] --&gt;/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x7f31596cbfd5] --&gt;/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x7f31596cbdfb] ) 0-:received signum (15), shutting down</div><div>[2017-06-29 17:59:50.397107] E [socket.c:3203:socket_connect] 0-glusterfs: connection attempt on <a href="http://10.2.44.23:24007">10.2.44.23:24007</a> failed, (Network is unreachable)</div><div>[2017-06-29 17:59:50.397138] I [socket.c:3507:socket_submit_request] 0-glusterfs: not connected (priv-&gt;connected = 0)</div><div>[2017-06-29 17:59:50.397162] W [rpc-clnt.c:1693:rpc_clnt_submit] 0-glusterfs: failed to submit rpc-request (XID: 0x3 Program: Gluster Portmap, ProgVers: 1, Proc: 5) to rpc-transport (glusterfs)</div><div><br></div><div>I think that important message is “Network is unreachable”.</div><div><br></div><div>Question</div><div>1. Could you please tell me, is that normal when you have many bricks? Networks is definitely stable and other servers use it without problem and all servers run on a same pair of switches. My assumption is that in the same time many bricks try to connect and that doesn’t work.</div><div><br></div><div>2. Is there an option to configure a brick to enable some kind of autoreconnect or add some timeout?</div><div>gluster volume set brick123 option456 abc ??</div><div><br></div><div>3. What it the recommend way to fix offline brick on the affected server? I don’t want to use “gluster volume stop/start” since affected bricks are online on other server and there is no reason to completely turn it off.</div><div><br></div><div>Thank you,</div><div>Jan</div></div>