<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<div class="moz-cite-prefix">On 09/04/2017 07:35 PM, Atin Mukherjee
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAGNCGH3f6S5uCs4Gti+Uc6QgBvxvZPuNS5invu_FbneCVpHjbg@mail.gmail.com">
<div dir="ltr">
<div>Ravi/Karthick,<br>
<br>
</div>
If one of the self heal process is down, will the statstics
heal-count command work?<br>
</div>
</blockquote>
<br>
No it doesn't seem to: glusterd stage-op phase fails because shd was
down on that node and we error out.<br>
FWIW, the error message <tt>"Gathering crawl statistics on volume
GROUP-WORK has been unsuccessful on bricks that are down. Please
check if all brick processes are running."</tt> is incorrect and
once <a class="moz-txt-link-freetext" href="https://review.gluster.org/#/c/15724/">https://review.gluster.org/#/c/15724/</a> gets merged, you will get
the correct error message like so:<br>
<i><br>
</i><tt>root@vm2 glusterfs]# gluster v heal testvol statistics </tt><tt><br>
</tt><tt>Gathering crawl statistics on volume testvol has been
unsuccessful:</tt><tt><br>
</tt><tt> Staging failed on vm1. Error: Self-heal daemon is not
running. Check self-heal daemon log file.</tt><i><br>
</i><br>
<br>
-Ravi<br>
<blockquote type="cite"
cite="mid:CAGNCGH3f6S5uCs4Gti+Uc6QgBvxvZPuNS5invu_FbneCVpHjbg@mail.gmail.com">
<div class="gmail_extra"><br>
<div class="gmail_quote">On Mon, Sep 4, 2017 at 7:24 PM,
lejeczek <span dir="ltr"><<a
href="mailto:peljasz@yahoo.co.uk" target="_blank"
moz-do-not-send="true">peljasz@yahoo.co.uk</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">1) one
peer, out of four, got separated from the network, from the
rest of the cluster.<br>
2) that unavailable(while it was unavailable) peer got
detached with "gluster peer detach" command which succeeded,
so now cluster comprise of three peers<br>
3) Self-heal daemon (for some reason) does not start(with an
attempt to restart glusted) on the peer which probed that
fourth peer.<br>
4) fourth unavailable peer is still up & running but is
inaccessible to other peers for network is disconnected,
segmented. That peer's gluster status show peer is still in
the cluster.<br>
5) So, fourth peer's gluster(nor other processes) stack did
not fail nor crushed, just network got, is disconnected.<br>
6) peer status show ok & connected for current three
peers.<br>
<br>
This is third time when it happens to me, very same way:
each time net-disjointed peer was brought back online then
statistics & details worked again.<br>
<br>
can you not reproduce it?<br>
<br>
$ gluster vol info QEMU-VMs<br>
<br>
Volume Name: QEMU-VMs<br>
Type: Replicate<br>
Volume ID: 8709782a-daa5-4434-a816-c4e0ae<wbr>f8fef2<br>
Status: Started<br>
Snapshot Count: 0<br>
Number of Bricks: 1 x 3 = 3<br>
Transport-type: tcp<br>
Bricks:<br>
Brick1: 10.5.6.32:/__.aLocalStorages/0<wbr>/0-GLUSTERs/0GLUSTER-QEMU-VMs<br>
Brick2: 10.5.6.49:/__.aLocalStorages/0<wbr>/0-GLUSTERs/0GLUSTER-QEMU-VMs<br>
Brick3: 10.5.6.100:/__.aLocalStorages/<wbr>0/0-GLUSTERs/0GLUSTER-QEMU-VMs<br>
Options Reconfigured:<br>
transport.address-family: inet<br>
nfs.disable: on<br>
storage.owner-gid: 107<br>
storage.owner-uid: 107<br>
performance.readdir-ahead: on<br>
geo-replication.indexing: on<br>
geo-replication.ignore-pid-che<wbr>ck: on<br>
changelog.changelog: on<br>
<br>
$ gluster vol status QEMU-VMs<br>
Status of volume: QEMU-VMs<br>
Gluster process <wbr> TCP Port
RDMA Port Online Pid<br>
------------------------------<wbr>------------------------------<wbr>------------------<br>
Brick 10.5.6.32:/__.aLocalStorages/0<wbr>/0-GLUS<br>
TERs/0GLUSTER-QEMU-VMs <wbr> 49156 0
Y 9302<br>
Brick 10.5.6.49:/__.aLocalStorages/0<wbr>/0-GLUS<br>
TERs/0GLUSTER-QEMU-VMs <wbr> 49156 0
Y 7610<br>
Brick 10.5.6.100:/__.aLocalStorages/<wbr>0/0-GLU<br>
STERs/0GLUSTER-QEMU-VMs <wbr> 49156 0
Y 11013<br>
Self-heal Daemon on localhost N/A N/A
Y 3069276<br>
Self-heal Daemon on 10.5.6.32 N/A N/A
Y 3315870<br>
Self-heal Daemon on 10.5.6.49 N/A N/A
N N/A <--- HERE<br>
Self-heal Daemon on 10.5.6.17 N/A N/A
Y 5163<br>
<br>
Task Status of Volume QEMU-VMs<br>
------------------------------<wbr>------------------------------<wbr>------------------<br>
There are no active volume tasks<br>
<br>
$ gluster vol heal QEMU-VMs statistics heal-count<br>
Gathering count of entries to be healed on volume QEMU-VMs
has been unsuccessful on bricks that are down. Please check
if all brick processes are running.<span class=""><br>
<br>
<br>
<br>
On 04/09/17 11:47, Atin Mukherjee wrote:<br>
</span>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class="">
Please provide the output of gluster volume info,
gluster volume status and gluster peer status.<br>
<br>
</span>
<div>
<div class="h5">
On Mon, Sep 4, 2017 at 4:07 PM, lejeczek <<a
href="mailto:peljasz@yahoo.co.uk" target="_blank"
moz-do-not-send="true">peljasz@yahoo.co.uk</a>
<mailto:<a href="mailto:peljasz@yahoo.co.uk"
target="_blank" moz-do-not-send="true">peljasz@yahoo.co.uk</a>>>
wrote:<br>
<br>
hi all<br>
<br>
this:<br>
$ vol heal $_vol info<br>
outputs ok and exit code is 0<br>
But if I want to see statistics:<br>
$ gluster vol heal $_vol statistics<br>
Gathering crawl statistics on volume GROUP-WORK
has<br>
been unsuccessful on bricks that are down. Please<br>
check if all brick processes are running.<br>
<br>
I suspect - gluster inability to cope with a
situation<br>
where one peer(which is not even a brick for a
single<br>
vol on the cluster!) is inaccessible to the rest
of<br>
cluster.<br>
I have not played with any other variations of
this<br>
case, eg. more than one peer goes down, etc.<br>
But I hope someone could try to replicate this
simple<br>
test case.<br>
<br>
Cluster and vols, when something like this
happens,<br>
seem accessible and as such "all" works, except
when<br>
you want more details.<br>
This also fails:<br>
$ gluster vol status $_vol detail<br>
Error : Request timed out<br>
<br>
My gluster(3.10.5-1.el7.x86_64) exhibits these<br>
symptoms every time one(at least) peers goes out
of<br>
the rest reach.<br>
<br>
maybe @devel can comment?<br>
<br>
many thanks, L.<br>
______________________________<wbr>_________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org"
target="_blank" moz-do-not-send="true">Gluster-users@gluster.org</a><br>
</div>
</div>
<mailto:<a href="mailto:Gluster-users@gluster.org"
target="_blank" moz-do-not-send="true">Gluster-users@gluster.<wbr>org</a>><br>
<a
href="http://lists.gluster.org/mailman/listinfo/gluster-users"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a><br>
<<a
href="http://lists.gluster.org/mailman/listinfo/gluster-users"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.gluster.org/mail<wbr>man/listinfo/gluster-users</a>><br>
<br>
<br>
</blockquote>
<br>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Gluster-users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>
<a class="moz-txt-link-freetext" href="http://lists.gluster.org/mailman/listinfo/gluster-users">http://lists.gluster.org/mailman/listinfo/gluster-users</a></pre>
</blockquote>
<br>
</body>
</html>