<div dir="ltr"><div>Thank you, I created bug with all logs:</div><a href="https://bugzilla.redhat.com/show_bug.cgi?id=1467050">https://bugzilla.redhat.com/show_bug.cgi?id=1467050</a><br><div><br></div><div>During testing I found second bug:</div><div><a href="https://bugzilla.redhat.com/show_bug.cgi?id=1467057">https://bugzilla.redhat.com/show_bug.cgi?id=1467057</a></div><div>There something wrong with Ganesha when Gluster bricks are named &quot;w0&quot; or &quot;sw0&quot;.</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jun 30, 2017 at 11:36 AM, Hari Gowtham <span dir="ltr">&lt;<a href="mailto:hgowtham@redhat.com" target="_blank">hgowtham@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>

<br>

Jan, by multiple times I meant whether you were able to do the whole<br>

setup multiple times and face the same issue.<br>

So that we have a consistent reproducer to work on.<br>

<br>

As grepping shows that the process doesn&#39;t exist the bug I mentioned<br>

doesn&#39;t hold good.<br>

Seems like another issue irrelevant to the bug i mentioned (have<br>

mentioned it now).<br>

<br>

When you say too often, this means there is a way to reproduce it.<br>

Please do let us know the steps you performed to check. but this<br>

shouldn&#39;t happen if you try again.<br>

<br>

You won&#39;t have this issue often. and as Mani mentioned do not write a<br>

script to start force it.<br>

If this issue exists with a proper reproducer we will take a look at it.<br>

<br>

Sorry, forgot to provide the link for the fix:<br>

patch : <a href="https://review.gluster.org/#/c/17101/" rel="noreferrer" target="_blank">https://review.gluster.org/#/<wbr>c/17101/</a><br>

<br>

If you find a reproducer do file a bug at<br>

<a href="https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/<wbr>enter_bug.cgi?product=<wbr>GlusterFS</a><br>

<div class="HOEnZb"><div class="h5"><br>

<br>

On Fri, Jun 30, 2017 at 3:33 PM, Manikandan Selvaganesh<br>

&lt;<a href="mailto:manikandancs333@gmail.com">manikandancs333@gmail.com</a>&gt; wrote:<br>

&gt; Hi Jan,<br>

&gt;<br>

&gt; It is not recommended that you automate the script for &#39;volume start force&#39;.<br>

&gt; Bricks do not go offline just like that. There will be some genuine issue<br>

&gt; which triggers this. Could you please attach the entire glusterd.logs and<br>

&gt; the brick logs around the time so that someone would be able to look?<br>

&gt;<br>

&gt; Just to make sure, please check if you have any network outage(using iperf<br>

&gt; or some standard tool).<br>

&gt;<br>

&gt; @Hari, i think you forgot to provide the bug link, please provide so that<br>

&gt; Jan<br>

&gt; or someone can check if it is related.<br>

&gt;<br>

&gt;<br>

&gt; --<br>

&gt; Thanks &amp; Regards,<br>

&gt; Manikandan Selvaganesan.<br>

&gt; (@Manikandan Selvaganesh on Web)<br>

&gt;<br>

&gt; On Fri, Jun 30, 2017 at 3:19 PM, Jan &lt;<a href="mailto:jan.h.zak@gmail.com">jan.h.zak@gmail.com</a>&gt; wrote:<br>

&gt;&gt;<br>

&gt;&gt; Hi Hari,<br>

&gt;&gt;<br>

&gt;&gt; thank you for your support!<br>

&gt;&gt;<br>

&gt;&gt; Did I try to check offline bricks multiple times?<br>

&gt;&gt; Yes – I gave it enough time (at least 20 minutes) to recover but it stayed<br>

&gt;&gt; offline.<br>

&gt;&gt;<br>

&gt;&gt; Version?<br>

&gt;&gt; All nodes are 100% equal – I tried fresh installation several times during<br>

&gt;&gt; my testing, Every time it is CentOS Minimal install with all updates and<br>

&gt;&gt; without any additional software:<br>

&gt;&gt;<br>

&gt;&gt; uname -r<br>

&gt;&gt; 3.10.0-514.21.2.el7.x86_64<br>

&gt;&gt;<br>

&gt;&gt; yum list installed | egrep &#39;gluster|ganesha&#39;<br>

&gt;&gt; centos-release-gluster310.<wbr>noarch     1.0-1.el7.centos         @extras<br>

&gt;&gt; glusterfs.x86_64                     3.10.2-1.el7<br>

&gt;&gt; @centos-gluster310<br>

&gt;&gt; glusterfs-api.x86_64                 3.10.2-1.el7<br>

&gt;&gt; @centos-gluster310<br>

&gt;&gt; glusterfs-cli.x86_64                 3.10.2-1.el7<br>

&gt;&gt; @centos-gluster310<br>

&gt;&gt; glusterfs-client-xlators.x86_<wbr>64      3.10.2-1.el7<br>

&gt;&gt; @centos-gluster310<br>

&gt;&gt; glusterfs-fuse.x86_64                3.10.2-1.el7<br>

&gt;&gt; @centos-gluster310<br>

&gt;&gt; glusterfs-ganesha.x86_64             3.10.2-1.el7<br>

&gt;&gt; @centos-gluster310<br>

&gt;&gt; glusterfs-libs.x86_64                3.10.2-1.el7<br>

&gt;&gt; @centos-gluster310<br>

&gt;&gt; glusterfs-server.x86_64              3.10.2-1.el7<br>

&gt;&gt; @centos-gluster310<br>

&gt;&gt; libntirpc.x86_64                     1.4.3-1.el7<br>

&gt;&gt; @centos-gluster310<br>

&gt;&gt; nfs-ganesha.x86_64                   2.4.5-1.el7<br>

&gt;&gt; @centos-gluster310<br>

&gt;&gt; nfs-ganesha-gluster.x86_64           2.4.5-1.el7<br>

&gt;&gt; @centos-gluster310<br>

&gt;&gt; userspace-rcu.x86_64                 0.7.16-3.el7<br>

&gt;&gt; @centos-gluster310<br>

&gt;&gt;<br>

&gt;&gt; Grepping for the brick process?<br>

&gt;&gt; I’ve just tried it again. Process doesn’t exist when brick is offline.<br>

&gt;&gt;<br>

&gt;&gt; Force start command?<br>

&gt;&gt; sudo gluster volume start MyVolume force<br>

&gt;&gt;<br>

&gt;&gt; That works! Thank you.<br>

&gt;&gt;<br>

&gt;&gt; If I have this issue too often then I can create simple script that greps<br>

&gt;&gt; all bricks on the local server and force start when it’s offline. I can<br>

&gt;&gt; schedule such script once after for example 5 minutes after boot.<br>

&gt;&gt;<br>

&gt;&gt; But I’m not sure if it’s good idea to automate it. I’d be worried that I<br>

&gt;&gt; can force it up even when the node doesn’t “see” other nodes and cause split<br>

&gt;&gt; brain issue.<br>

&gt;&gt;<br>

&gt;&gt; Thank you!<br>

&gt;&gt;<br>

&gt;&gt; Kind regards,<br>

&gt;&gt; Jan<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; On Fri, Jun 30, 2017 at 8:01 AM, Hari Gowtham &lt;<a href="mailto:hgowtham@redhat.com">hgowtham@redhat.com</a>&gt; wrote:<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Hi Jan,<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; comments inline.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; On Fri, Jun 30, 2017 at 1:31 AM, Jan &lt;<a href="mailto:jan.h.zak@gmail.com">jan.h.zak@gmail.com</a>&gt; wrote:<br>

&gt;&gt;&gt; &gt; Hi all,<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; Gluster and Ganesha are amazing. Thank you for this great work!<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; I’m struggling with one issue and I think that you might be able to<br>

&gt;&gt;&gt; &gt; help me.<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; I spent some time by playing with Gluster and Ganesha and after I gain<br>

&gt;&gt;&gt; &gt; some<br>

&gt;&gt;&gt; &gt; experience I decided that I should go into production but I’m still<br>

&gt;&gt;&gt; &gt; struggling with one issue.<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; I have 3x node CentOS 7.3 with the most current Gluster and Ganesha<br>

&gt;&gt;&gt; &gt; from<br>

&gt;&gt;&gt; &gt; centos-gluster310 repository (3.10.2-1.el7) with replicated bricks.<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; Servers have a lot of resources and they run in a subnet on a stable<br>

&gt;&gt;&gt; &gt; network.<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; I didn’t have any issues when I tested a single brick. But now I’d like<br>

&gt;&gt;&gt; &gt; to<br>

&gt;&gt;&gt; &gt; setup 17 replicated bricks and I realized that when I restart one of<br>

&gt;&gt;&gt; &gt; nodes<br>

&gt;&gt;&gt; &gt; then the result looks like this:<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; sudo gluster volume status | grep &#39; N &#39;<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; Brick glunode0:/st/brick3/dir          N/A       N/A        N       N/A<br>

&gt;&gt;&gt; &gt; Brick glunode1:/st/brick2/dir          N/A       N/A        N       N/A<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; did you try it multiple times?<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; &gt; Some bricks just don’t go online. Sometime it’s one brick, sometime<br>

&gt;&gt;&gt; &gt; tree and<br>

&gt;&gt;&gt; &gt; it’s not same brick – it’s random issue.<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; I checked log on affected servers and this is an example:<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; sudo tail /var/log/glusterfs/bricks/st-<wbr>brick3-0.log<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; [2017-06-29 17:59:48.651581] W [socket.c:593:__socket_rwv] 0-glusterfs:<br>

&gt;&gt;&gt; &gt; readv on <a href="http://10.2.44.23:24007" rel="noreferrer" target="_blank">10.2.44.23:24007</a> failed (No data available)<br>

&gt;&gt;&gt; &gt; [2017-06-29 17:59:48.651622] E [glusterfsd-mgmt.c:2114:mgmt_<wbr>rpc_notify]<br>

&gt;&gt;&gt; &gt; 0-glusterfsd-mgmt: failed to connect with remote-host: glunode0 (No<br>

&gt;&gt;&gt; &gt; data<br>

&gt;&gt;&gt; &gt; available)<br>

&gt;&gt;&gt; &gt; [2017-06-29 17:59:48.651638] I [glusterfsd-mgmt.c:2133:mgmt_<wbr>rpc_notify]<br>

&gt;&gt;&gt; &gt; 0-glusterfsd-mgmt: Exhausted all volfile servers<br>

&gt;&gt;&gt; &gt; [2017-06-29 17:59:49.944103] W [glusterfsd.c:1332:cleanup_<wbr>and_exit]<br>

&gt;&gt;&gt; &gt; (--&gt;/lib64/libpthread.so.0(+<wbr>0x7dc5) [0x7f3158032dc5]<br>

&gt;&gt;&gt; &gt; --&gt;/usr/sbin/glusterfsd(<wbr>glusterfs_sigwaiter+0xe5) [0x7f31596cbfd5]<br>

&gt;&gt;&gt; &gt; --&gt;/usr/sbin/glusterfsd(<wbr>cleanup_and_exit+0x6b) [0x7f31596cbdfb] )<br>

&gt;&gt;&gt; &gt; 0-:received signum (15), shutting down<br>

&gt;&gt;&gt; &gt; [2017-06-29 17:59:50.397107] E [socket.c:3203:socket_connect]<br>

&gt;&gt;&gt; &gt; 0-glusterfs:<br>

&gt;&gt;&gt; &gt; connection attempt on <a href="http://10.2.44.23:24007" rel="noreferrer" target="_blank">10.2.44.23:24007</a> failed, (Network is unreachable)<br>

&gt;&gt;&gt; &gt; [2017-06-29 17:59:50.397138] I [socket.c:3507:socket_submit_<wbr>request]<br>

&gt;&gt;&gt; &gt; 0-glusterfs: not connected (priv-&gt;connected = 0)<br>

&gt;&gt;&gt; &gt; [2017-06-29 17:59:50.397162] W [rpc-clnt.c:1693:rpc_clnt_<wbr>submit]<br>

&gt;&gt;&gt; &gt; 0-glusterfs: failed to submit rpc-request (XID: 0x3 Program: Gluster<br>

&gt;&gt;&gt; &gt; Portmap, ProgVers: 1, Proc: 5) to rpc-transport (glusterfs)<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; I think that important message is “Network is unreachable”.<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; Question<br>

&gt;&gt;&gt; &gt; 1. Could you please tell me, is that normal when you have many bricks?<br>

&gt;&gt;&gt; &gt; Networks is definitely stable and other servers use it without problem<br>

&gt;&gt;&gt; &gt; and<br>

&gt;&gt;&gt; &gt; all servers run on a same pair of switches. My assumption is that in<br>

&gt;&gt;&gt; &gt; the<br>

&gt;&gt;&gt; &gt; same time many bricks try to connect and that doesn’t work.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; no. it shouldnt happen if there are multiple bricks.<br>

&gt;&gt;&gt; there was a bug related to this [1]<br>

&gt;&gt;&gt; to verify if that was the issue I need to know a few things.<br>

&gt;&gt;&gt; 1) are all the node of the same version.<br>

&gt;&gt;&gt; 2) did you check grepping for the brick process using the ps command?<br>

&gt;&gt;&gt; need to verify is the brick is still up and is not connected to glusterd<br>

&gt;&gt;&gt; alone.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; 2. Is there an option to configure a brick to enable some kind of<br>

&gt;&gt;&gt; &gt; autoreconnect or add some timeout?<br>

&gt;&gt;&gt; &gt; gluster volume set brick123 option456 abc ??<br>

&gt;&gt;&gt; If the brick process is not seen in the ps aux | grep glusterfsd<br>

&gt;&gt;&gt; The way to start a brick is to use the volume start force command.<br>

&gt;&gt;&gt; If brick is not started there is no point configuring it. and to start<br>

&gt;&gt;&gt; a brick we cant<br>

&gt;&gt;&gt; use the configure command.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; 3. What it the recommend way to fix offline brick on the affected<br>

&gt;&gt;&gt; &gt; server? I<br>

&gt;&gt;&gt; &gt; don’t want to use “gluster volume stop/start” since affected bricks are<br>

&gt;&gt;&gt; &gt; online on other server and there is no reason to completely turn it<br>

&gt;&gt;&gt; &gt; off.<br>

&gt;&gt;&gt; gluster volume start force will not bring down the bricks that are<br>

&gt;&gt;&gt; already up and<br>

&gt;&gt;&gt; running.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; Thank you,<br>

&gt;&gt;&gt; &gt; Jan<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; ______________________________<wbr>_________________<br>

&gt;&gt;&gt; &gt; Gluster-users mailing list<br>

&gt;&gt;&gt; &gt; <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

&gt;&gt;&gt; &gt; <a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; --<br>

&gt;&gt;&gt; Regards,<br>

&gt;&gt;&gt; Hari Gowtham.<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; ______________________________<wbr>_________________<br>

&gt;&gt; Gluster-users mailing list<br>

&gt;&gt; <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

&gt;&gt; <a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br>

&gt;<br>

&gt;<br>

<br>

<br>

<br>

</div></div><span class="HOEnZb"><font color="#888888">--<br>

Regards,<br>

Hari Gowtham.<br>

</font></span></blockquote></div><br></div>