[Gluster-users] heal info OK but statistics not working

Tue Sep 5 07:40:14 UTC 2017

On 09/04/2017 07:35 PM, Atin Mukherjee wrote:
> Ravi/Karthick,
>
> If one of the self heal process is down, will the statstics heal-count 
> command work?

No it doesn't seem to: glusterd stage-op phase fails because shd was 
down on that node and we error out.
FWIW, the error message "Gathering crawl statistics on volume GROUP-WORK 
has been unsuccessful on bricks that are down. Please check if all brick 
processes are running." is incorrect and once 
https://review.gluster.org/#/c/15724/ gets merged, you will get the 
correct error message like so:
/
/root at vm2 glusterfs]# gluster v heal testvol statistics
Gathering crawl statistics on volume testvol has been unsuccessful:
  Staging failed on vm1. Error: Self-heal daemon is not running. Check 
self-heal daemon log file./
/

-Ravi
>
> On Mon, Sep 4, 2017 at 7:24 PM, lejeczek <peljasz at yahoo.co.uk 
> <mailto:peljasz at yahoo.co.uk>> wrote:
>
>     1) one peer, out of four, got separated from the network, from the
>     rest of the cluster.
>     2) that unavailable(while it was unavailable) peer got detached
>     with "gluster peer detach" command which succeeded, so now cluster
>     comprise of three peers
>     3) Self-heal daemon (for some reason) does not start(with an
>     attempt to restart glusted) on the peer which probed that fourth peer.
>     4) fourth unavailable peer is still up & running but is
>     inaccessible to other peers for network is disconnected,
>     segmented. That peer's gluster status show peer is still in the
>     cluster.
>     5) So, fourth peer's gluster(nor other processes) stack did not
>     fail nor crushed, just network got, is disconnected.
>     6) peer status show ok & connected for current three peers.
>
>     This is third time when it happens to me, very same way: each time
>     net-disjointed peer was brought back online then statistics &
>     details worked again.
>
>     can you not reproduce it?
>
>     $ gluster vol info QEMU-VMs
>
>     Volume Name: QEMU-VMs
>     Type: Replicate
>     Volume ID: 8709782a-daa5-4434-a816-c4e0aef8fef2
>     Status: Started
>     Snapshot Count: 0
>     Number of Bricks: 1 x 3 = 3
>     Transport-type: tcp
>     Bricks:
>     Brick1: 10.5.6.32:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-QEMU-VMs
>     Brick2: 10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-QEMU-VMs
>     Brick3: 10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-QEMU-VMs
>     Options Reconfigured:
>     transport.address-family: inet
>     nfs.disable: on
>     storage.owner-gid: 107
>     storage.owner-uid: 107
>     performance.readdir-ahead: on
>     geo-replication.indexing: on
>     geo-replication.ignore-pid-check: on
>     changelog.changelog: on
>
>     $ gluster vol status QEMU-VMs
>     Status of volume: QEMU-VMs
>     Gluster process       TCP Port RDMA Port Online  Pid
>     ------------------------------------------------------------------------------
>     Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS
>     TERs/0GLUSTER-QEMU-VMs               49156     0 Y       9302
>     Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS
>     TERs/0GLUSTER-QEMU-VMs               49156     0 Y       7610
>     Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU
>     STERs/0GLUSTER-QEMU-VMs               49156     0 Y       11013
>     Self-heal Daemon on localhost               N/A       N/A Y      
>     3069276
>     Self-heal Daemon on 10.5.6.32               N/A       N/A Y      
>     3315870
>     Self-heal Daemon on 10.5.6.49               N/A       N/A N      
>     N/A  <--- HERE
>     Self-heal Daemon on 10.5.6.17               N/A       N/A Y       5163
>
>     Task Status of Volume QEMU-VMs
>     ------------------------------------------------------------------------------
>     There are no active volume tasks
>
>     $ gluster vol heal QEMU-VMs statistics heal-count
>     Gathering count of entries to be healed on volume QEMU-VMs has
>     been unsuccessful on bricks that are down. Please check if all
>     brick processes are running.
>
>
>
>     On 04/09/17 11:47, Atin Mukherjee wrote:
>
>         Please provide the output of gluster volume info, gluster
>         volume status and gluster peer status.
>
>         On Mon, Sep 4, 2017 at 4:07 PM, lejeczek <peljasz at yahoo.co.uk
>         <mailto:peljasz at yahoo.co.uk> <mailto:peljasz at yahoo.co.uk
>         <mailto:peljasz at yahoo.co.uk>>> wrote:
>
>             hi all
>
>             this:
>             $ vol heal $_vol info
>             outputs ok and exit code is 0
>             But if I want to see statistics:
>             $ gluster vol heal $_vol statistics
>             Gathering crawl statistics on volume GROUP-WORK has
>             been unsuccessful on bricks that are down. Please
>             check if all brick processes are running.
>
>             I suspect - gluster inability to cope with a situation
>             where one peer(which is not even a brick for a single
>             vol on the cluster!) is inaccessible to the rest of
>             cluster.
>             I have not played with any other variations of this
>             case, eg. more than one peer goes down, etc.
>             But I hope someone could try to replicate this simple
>             test case.
>
>             Cluster and vols, when something like this happens,
>             seem accessible and as such "all" works, except when
>             you want more details.
>             This also fails:
>             $ gluster vol status $_vol detail
>             Error : Request timed out
>
>             My gluster(3.10.5-1.el7.x86_64) exhibits these
>             symptoms every time one(at least) peers goes out of
>             the rest reach.
>
>             maybe @devel can comment?
>
>             many thanks, L.
>             _______________________________________________
>             Gluster-users mailing list
>         Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>             <mailto:Gluster-users at gluster.org
>         <mailto:Gluster-users at gluster.org>>
>         http://lists.gluster.org/mailman/listinfo/gluster-users
>         <http://lists.gluster.org/mailman/listinfo/gluster-users>
>             <http://lists.gluster.org/mailman/listinfo/gluster-users
>         <http://lists.gluster.org/mailman/listinfo/gluster-users>>
>
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170905/47676f4a/attachment.html>