<div dir="ltr"><div>Thank you, I created bug with all logs:</div><a href="https://bugzilla.redhat.com/show_bug.cgi?id=1467050">https://bugzilla.redhat.com/show_bug.cgi?id=1467050</a><br><div><br></div><div>During testing I found second bug:</div><div><a href="https://bugzilla.redhat.com/show_bug.cgi?id=1467057">https://bugzilla.redhat.com/show_bug.cgi?id=1467057</a></div><div>There something wrong with Ganesha when Gluster bricks are named "w0" or "sw0".</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jun 30, 2017 at 11:36 AM, Hari Gowtham <span dir="ltr"><<a href="mailto:hgowtham@redhat.com" target="_blank">hgowtham@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
Jan, by multiple times I meant whether you were able to do the whole<br>
setup multiple times and face the same issue.<br>
So that we have a consistent reproducer to work on.<br>
<br>
As grepping shows that the process doesn't exist the bug I mentioned<br>
doesn't hold good.<br>
Seems like another issue irrelevant to the bug i mentioned (have<br>
mentioned it now).<br>
<br>
When you say too often, this means there is a way to reproduce it.<br>
Please do let us know the steps you performed to check. but this<br>
shouldn't happen if you try again.<br>
<br>
You won't have this issue often. and as Mani mentioned do not write a<br>
script to start force it.<br>
If this issue exists with a proper reproducer we will take a look at it.<br>
<br>
Sorry, forgot to provide the link for the fix:<br>
patch : <a href="https://review.gluster.org/#/c/17101/" rel="noreferrer" target="_blank">https://review.gluster.org/#/<wbr>c/17101/</a><br>
<br>
If you find a reproducer do file a bug at<br>
<a href="https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/<wbr>enter_bug.cgi?product=<wbr>GlusterFS</a><br>
<div class="HOEnZb"><div class="h5"><br>
<br>
On Fri, Jun 30, 2017 at 3:33 PM, Manikandan Selvaganesh<br>
<<a href="mailto:manikandancs333@gmail.com">manikandancs333@gmail.com</a>> wrote:<br>
> Hi Jan,<br>
><br>
> It is not recommended that you automate the script for 'volume start force'.<br>
> Bricks do not go offline just like that. There will be some genuine issue<br>
> which triggers this. Could you please attach the entire glusterd.logs and<br>
> the brick logs around the time so that someone would be able to look?<br>
><br>
> Just to make sure, please check if you have any network outage(using iperf<br>
> or some standard tool).<br>
><br>
> @Hari, i think you forgot to provide the bug link, please provide so that<br>
> Jan<br>
> or someone can check if it is related.<br>
><br>
><br>
> --<br>
> Thanks & Regards,<br>
> Manikandan Selvaganesan.<br>
> (@Manikandan Selvaganesh on Web)<br>
><br>
> On Fri, Jun 30, 2017 at 3:19 PM, Jan <<a href="mailto:jan.h.zak@gmail.com">jan.h.zak@gmail.com</a>> wrote:<br>
>><br>
>> Hi Hari,<br>
>><br>
>> thank you for your support!<br>
>><br>
>> Did I try to check offline bricks multiple times?<br>
>> Yes – I gave it enough time (at least 20 minutes) to recover but it stayed<br>
>> offline.<br>
>><br>
>> Version?<br>
>> All nodes are 100% equal – I tried fresh installation several times during<br>
>> my testing, Every time it is CentOS Minimal install with all updates and<br>
>> without any additional software:<br>
>><br>
>> uname -r<br>
>> 3.10.0-514.21.2.el7.x86_64<br>
>><br>
>> yum list installed | egrep 'gluster|ganesha'<br>
>> centos-release-gluster310.<wbr>noarch 1.0-1.el7.centos @extras<br>
>> glusterfs.x86_64 3.10.2-1.el7<br>
>> @centos-gluster310<br>
>> glusterfs-api.x86_64 3.10.2-1.el7<br>
>> @centos-gluster310<br>
>> glusterfs-cli.x86_64 3.10.2-1.el7<br>
>> @centos-gluster310<br>
>> glusterfs-client-xlators.x86_<wbr>64 3.10.2-1.el7<br>
>> @centos-gluster310<br>
>> glusterfs-fuse.x86_64 3.10.2-1.el7<br>
>> @centos-gluster310<br>
>> glusterfs-ganesha.x86_64 3.10.2-1.el7<br>
>> @centos-gluster310<br>
>> glusterfs-libs.x86_64 3.10.2-1.el7<br>
>> @centos-gluster310<br>
>> glusterfs-server.x86_64 3.10.2-1.el7<br>
>> @centos-gluster310<br>
>> libntirpc.x86_64 1.4.3-1.el7<br>
>> @centos-gluster310<br>
>> nfs-ganesha.x86_64 2.4.5-1.el7<br>
>> @centos-gluster310<br>
>> nfs-ganesha-gluster.x86_64 2.4.5-1.el7<br>
>> @centos-gluster310<br>
>> userspace-rcu.x86_64 0.7.16-3.el7<br>
>> @centos-gluster310<br>
>><br>
>> Grepping for the brick process?<br>
>> I’ve just tried it again. Process doesn’t exist when brick is offline.<br>
>><br>
>> Force start command?<br>
>> sudo gluster volume start MyVolume force<br>
>><br>
>> That works! Thank you.<br>
>><br>
>> If I have this issue too often then I can create simple script that greps<br>
>> all bricks on the local server and force start when it’s offline. I can<br>
>> schedule such script once after for example 5 minutes after boot.<br>
>><br>
>> But I’m not sure if it’s good idea to automate it. I’d be worried that I<br>
>> can force it up even when the node doesn’t “see” other nodes and cause split<br>
>> brain issue.<br>
>><br>
>> Thank you!<br>
>><br>
>> Kind regards,<br>
>> Jan<br>
>><br>
>><br>
>> On Fri, Jun 30, 2017 at 8:01 AM, Hari Gowtham <<a href="mailto:hgowtham@redhat.com">hgowtham@redhat.com</a>> wrote:<br>
>>><br>
>>> Hi Jan,<br>
>>><br>
>>> comments inline.<br>
>>><br>
>>> On Fri, Jun 30, 2017 at 1:31 AM, Jan <<a href="mailto:jan.h.zak@gmail.com">jan.h.zak@gmail.com</a>> wrote:<br>
>>> > Hi all,<br>
>>> ><br>
>>> > Gluster and Ganesha are amazing. Thank you for this great work!<br>
>>> ><br>
>>> > I’m struggling with one issue and I think that you might be able to<br>
>>> > help me.<br>
>>> ><br>
>>> > I spent some time by playing with Gluster and Ganesha and after I gain<br>
>>> > some<br>
>>> > experience I decided that I should go into production but I’m still<br>
>>> > struggling with one issue.<br>
>>> ><br>
>>> > I have 3x node CentOS 7.3 with the most current Gluster and Ganesha<br>
>>> > from<br>
>>> > centos-gluster310 repository (3.10.2-1.el7) with replicated bricks.<br>
>>> ><br>
>>> > Servers have a lot of resources and they run in a subnet on a stable<br>
>>> > network.<br>
>>> ><br>
>>> > I didn’t have any issues when I tested a single brick. But now I’d like<br>
>>> > to<br>
>>> > setup 17 replicated bricks and I realized that when I restart one of<br>
>>> > nodes<br>
>>> > then the result looks like this:<br>
>>> ><br>
>>> > sudo gluster volume status | grep ' N '<br>
>>> ><br>
>>> > Brick glunode0:/st/brick3/dir N/A N/A N N/A<br>
>>> > Brick glunode1:/st/brick2/dir N/A N/A N N/A<br>
>>> ><br>
>>><br>
>>> did you try it multiple times?<br>
>>><br>
>>> > Some bricks just don’t go online. Sometime it’s one brick, sometime<br>
>>> > tree and<br>
>>> > it’s not same brick – it’s random issue.<br>
>>> ><br>
>>> > I checked log on affected servers and this is an example:<br>
>>> ><br>
>>> > sudo tail /var/log/glusterfs/bricks/st-<wbr>brick3-0.log<br>
>>> ><br>
>>> > [2017-06-29 17:59:48.651581] W [socket.c:593:__socket_rwv] 0-glusterfs:<br>
>>> > readv on <a href="http://10.2.44.23:24007" rel="noreferrer" target="_blank">10.2.44.23:24007</a> failed (No data available)<br>
>>> > [2017-06-29 17:59:48.651622] E [glusterfsd-mgmt.c:2114:mgmt_<wbr>rpc_notify]<br>
>>> > 0-glusterfsd-mgmt: failed to connect with remote-host: glunode0 (No<br>
>>> > data<br>
>>> > available)<br>
>>> > [2017-06-29 17:59:48.651638] I [glusterfsd-mgmt.c:2133:mgmt_<wbr>rpc_notify]<br>
>>> > 0-glusterfsd-mgmt: Exhausted all volfile servers<br>
>>> > [2017-06-29 17:59:49.944103] W [glusterfsd.c:1332:cleanup_<wbr>and_exit]<br>
>>> > (-->/lib64/libpthread.so.0(+<wbr>0x7dc5) [0x7f3158032dc5]<br>
>>> > -->/usr/sbin/glusterfsd(<wbr>glusterfs_sigwaiter+0xe5) [0x7f31596cbfd5]<br>
>>> > -->/usr/sbin/glusterfsd(<wbr>cleanup_and_exit+0x6b) [0x7f31596cbdfb] )<br>
>>> > 0-:received signum (15), shutting down<br>
>>> > [2017-06-29 17:59:50.397107] E [socket.c:3203:socket_connect]<br>
>>> > 0-glusterfs:<br>
>>> > connection attempt on <a href="http://10.2.44.23:24007" rel="noreferrer" target="_blank">10.2.44.23:24007</a> failed, (Network is unreachable)<br>
>>> > [2017-06-29 17:59:50.397138] I [socket.c:3507:socket_submit_<wbr>request]<br>
>>> > 0-glusterfs: not connected (priv->connected = 0)<br>
>>> > [2017-06-29 17:59:50.397162] W [rpc-clnt.c:1693:rpc_clnt_<wbr>submit]<br>
>>> > 0-glusterfs: failed to submit rpc-request (XID: 0x3 Program: Gluster<br>
>>> > Portmap, ProgVers: 1, Proc: 5) to rpc-transport (glusterfs)<br>
>>> ><br>
>>> > I think that important message is “Network is unreachable”.<br>
>>> ><br>
>>> > Question<br>
>>> > 1. Could you please tell me, is that normal when you have many bricks?<br>
>>> > Networks is definitely stable and other servers use it without problem<br>
>>> > and<br>
>>> > all servers run on a same pair of switches. My assumption is that in<br>
>>> > the<br>
>>> > same time many bricks try to connect and that doesn’t work.<br>
>>><br>
>>> no. it shouldnt happen if there are multiple bricks.<br>
>>> there was a bug related to this [1]<br>
>>> to verify if that was the issue I need to know a few things.<br>
>>> 1) are all the node of the same version.<br>
>>> 2) did you check grepping for the brick process using the ps command?<br>
>>> need to verify is the brick is still up and is not connected to glusterd<br>
>>> alone.<br>
>>><br>
>>><br>
>>> ><br>
>>> > 2. Is there an option to configure a brick to enable some kind of<br>
>>> > autoreconnect or add some timeout?<br>
>>> > gluster volume set brick123 option456 abc ??<br>
>>> If the brick process is not seen in the ps aux | grep glusterfsd<br>
>>> The way to start a brick is to use the volume start force command.<br>
>>> If brick is not started there is no point configuring it. and to start<br>
>>> a brick we cant<br>
>>> use the configure command.<br>
>>><br>
>>> ><br>
>>> > 3. What it the recommend way to fix offline brick on the affected<br>
>>> > server? I<br>
>>> > don’t want to use “gluster volume stop/start” since affected bricks are<br>
>>> > online on other server and there is no reason to completely turn it<br>
>>> > off.<br>
>>> gluster volume start force will not bring down the bricks that are<br>
>>> already up and<br>
>>> running.<br>
>>><br>
>>> ><br>
>>> > Thank you,<br>
>>> > Jan<br>
>>> ><br>
>>> > ______________________________<wbr>_________________<br>
>>> > Gluster-users mailing list<br>
>>> > <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
>>> > <a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br>
>>><br>
>>><br>
>>><br>
>>> --<br>
>>> Regards,<br>
>>> Hari Gowtham.<br>
>><br>
>><br>
>><br>
>> ______________________________<wbr>_________________<br>
>> Gluster-users mailing list<br>
>> <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
>> <a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br>
><br>
><br>
<br>
<br>
<br>
</div></div><span class="HOEnZb"><font color="#888888">--<br>
Regards,<br>
Hari Gowtham.<br>
</font></span></blockquote></div><br></div>