[Gluster-users] heal info OK but statistics not working

Mon Sep 4 14:05:07 UTC 2017

Ravi/Karthick,

If one of the self heal process is down, will the statstics heal-count
command work?

On Mon, Sep 4, 2017 at 7:24 PM, lejeczek <peljasz at yahoo.co.uk> wrote:

> 1) one peer, out of four, got separated from the network, from the rest of
> the cluster.
> 2) that unavailable(while it was unavailable) peer got detached with
> "gluster peer detach" command which succeeded, so now cluster comprise of
> three peers
> 3) Self-heal daemon (for some reason) does not start(with an attempt to
> restart glusted) on the peer which probed that fourth peer.
> 4) fourth unavailable peer is still up & running but is inaccessible to
> other peers for network is disconnected, segmented. That peer's gluster
> status show peer is still in the cluster.
> 5) So, fourth peer's gluster(nor other processes) stack did not fail nor
> crushed, just network got, is disconnected.
> 6) peer status show ok & connected for current three peers.
>
> This is third time when it happens to me, very same way: each time
> net-disjointed peer was brought back online then statistics & details
> worked again.
>
> can you not reproduce it?
>
> $ gluster vol info QEMU-VMs
>
> Volume Name: QEMU-VMs
> Type: Replicate
> Volume ID: 8709782a-daa5-4434-a816-c4e0aef8fef2
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: 10.5.6.32:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-QEMU-VMs
> Brick2: 10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-QEMU-VMs
> Brick3: 10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-QEMU-VMs
> Options Reconfigured:
> transport.address-family: inet
> nfs.disable: on
> storage.owner-gid: 107
> storage.owner-uid: 107
> performance.readdir-ahead: on
> geo-replication.indexing: on
> geo-replication.ignore-pid-check: on
> changelog.changelog: on
>
> $ gluster vol status QEMU-VMs
> Status of volume: QEMU-VMs
> Gluster process                             TCP Port  RDMA Port Online
> Pid
> ------------------------------------------------------------
> ------------------
> Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS
> TERs/0GLUSTER-QEMU-VMs                      49156     0 Y       9302
> Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS
> TERs/0GLUSTER-QEMU-VMs                      49156     0 Y       7610
> Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU
> STERs/0GLUSTER-QEMU-VMs                     49156     0 Y       11013
> Self-heal Daemon on localhost               N/A       N/A Y       3069276
> Self-heal Daemon on 10.5.6.32               N/A       N/A Y       3315870
> Self-heal Daemon on 10.5.6.49               N/A       N/A N       N/A
> <--- HERE
> Self-heal Daemon on 10.5.6.17               N/A       N/A Y       5163
>
> Task Status of Volume QEMU-VMs
> ------------------------------------------------------------
> ------------------
> There are no active volume tasks
>
> $ gluster vol heal QEMU-VMs statistics heal-count
> Gathering count of entries to be healed on volume QEMU-VMs has been
> unsuccessful on bricks that are down. Please check if all brick processes
> are running.
>
>
>
> On 04/09/17 11:47, Atin Mukherjee wrote:
>
>> Please provide the output of gluster volume info, gluster volume status
>> and gluster peer status.
>>
>> On Mon, Sep 4, 2017 at 4:07 PM, lejeczek <peljasz at yahoo.co.uk <mailto:
>> peljasz at yahoo.co.uk>> wrote:
>>
>>     hi all
>>
>>     this:
>>     $ vol heal $_vol info
>>     outputs ok and exit code is 0
>>     But if I want to see statistics:
>>     $ gluster vol heal $_vol statistics
>>     Gathering crawl statistics on volume GROUP-WORK has
>>     been unsuccessful on bricks that are down. Please
>>     check if all brick processes are running.
>>
>>     I suspect - gluster inability to cope with a situation
>>     where one peer(which is not even a brick for a single
>>     vol on the cluster!) is inaccessible to the rest of
>>     cluster.
>>     I have not played with any other variations of this
>>     case, eg. more than one peer goes down, etc.
>>     But I hope someone could try to replicate this simple
>>     test case.
>>
>>     Cluster and vols, when something like this happens,
>>     seem accessible and as such "all" works, except when
>>     you want more details.
>>     This also fails:
>>     $ gluster vol status $_vol detail
>>     Error : Request timed out
>>
>>     My gluster(3.10.5-1.el7.x86_64) exhibits these
>>     symptoms every time one(at least) peers goes out of
>>     the rest reach.
>>
>>     maybe @devel can comment?
>>
>>     many thanks, L.
>>     _______________________________________________
>>     Gluster-users mailing list
>>     Gluster-users at gluster.org
>>     <mailto:Gluster-users at gluster.org>
>>     http://lists.gluster.org/mailman/listinfo/gluster-users
>>     <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170904/e7d7bfed/attachment.html>