[Gluster-users] heal info OK but statistics not working
Atin Mukherjee
amukherj at redhat.com
Mon Sep 4 14:05:07 UTC 2017
Ravi/Karthick,
If one of the self heal process is down, will the statstics heal-count
command work?
On Mon, Sep 4, 2017 at 7:24 PM, lejeczek <peljasz at yahoo.co.uk> wrote:
> 1) one peer, out of four, got separated from the network, from the rest of
> the cluster.
> 2) that unavailable(while it was unavailable) peer got detached with
> "gluster peer detach" command which succeeded, so now cluster comprise of
> three peers
> 3) Self-heal daemon (for some reason) does not start(with an attempt to
> restart glusted) on the peer which probed that fourth peer.
> 4) fourth unavailable peer is still up & running but is inaccessible to
> other peers for network is disconnected, segmented. That peer's gluster
> status show peer is still in the cluster.
> 5) So, fourth peer's gluster(nor other processes) stack did not fail nor
> crushed, just network got, is disconnected.
> 6) peer status show ok & connected for current three peers.
>
> This is third time when it happens to me, very same way: each time
> net-disjointed peer was brought back online then statistics & details
> worked again.
>
> can you not reproduce it?
>
> $ gluster vol info QEMU-VMs
>
> Volume Name: QEMU-VMs
> Type: Replicate
> Volume ID: 8709782a-daa5-4434-a816-c4e0aef8fef2
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: 10.5.6.32:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-QEMU-VMs
> Brick2: 10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-QEMU-VMs
> Brick3: 10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-QEMU-VMs
> Options Reconfigured:
> transport.address-family: inet
> nfs.disable: on
> storage.owner-gid: 107
> storage.owner-uid: 107
> performance.readdir-ahead: on
> geo-replication.indexing: on
> geo-replication.ignore-pid-check: on
> changelog.changelog: on
>
> $ gluster vol status QEMU-VMs
> Status of volume: QEMU-VMs
> Gluster process TCP Port RDMA Port Online
> Pid
> ------------------------------------------------------------
> ------------------
> Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS
> TERs/0GLUSTER-QEMU-VMs 49156 0 Y 9302
> Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS
> TERs/0GLUSTER-QEMU-VMs 49156 0 Y 7610
> Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU
> STERs/0GLUSTER-QEMU-VMs 49156 0 Y 11013
> Self-heal Daemon on localhost N/A N/A Y 3069276
> Self-heal Daemon on 10.5.6.32 N/A N/A Y 3315870
> Self-heal Daemon on 10.5.6.49 N/A N/A N N/A
> <--- HERE
> Self-heal Daemon on 10.5.6.17 N/A N/A Y 5163
>
> Task Status of Volume QEMU-VMs
> ------------------------------------------------------------
> ------------------
> There are no active volume tasks
>
> $ gluster vol heal QEMU-VMs statistics heal-count
> Gathering count of entries to be healed on volume QEMU-VMs has been
> unsuccessful on bricks that are down. Please check if all brick processes
> are running.
>
>
>
> On 04/09/17 11:47, Atin Mukherjee wrote:
>
>> Please provide the output of gluster volume info, gluster volume status
>> and gluster peer status.
>>
>> On Mon, Sep 4, 2017 at 4:07 PM, lejeczek <peljasz at yahoo.co.uk <mailto:
>> peljasz at yahoo.co.uk>> wrote:
>>
>> hi all
>>
>> this:
>> $ vol heal $_vol info
>> outputs ok and exit code is 0
>> But if I want to see statistics:
>> $ gluster vol heal $_vol statistics
>> Gathering crawl statistics on volume GROUP-WORK has
>> been unsuccessful on bricks that are down. Please
>> check if all brick processes are running.
>>
>> I suspect - gluster inability to cope with a situation
>> where one peer(which is not even a brick for a single
>> vol on the cluster!) is inaccessible to the rest of
>> cluster.
>> I have not played with any other variations of this
>> case, eg. more than one peer goes down, etc.
>> But I hope someone could try to replicate this simple
>> test case.
>>
>> Cluster and vols, when something like this happens,
>> seem accessible and as such "all" works, except when
>> you want more details.
>> This also fails:
>> $ gluster vol status $_vol detail
>> Error : Request timed out
>>
>> My gluster(3.10.5-1.el7.x86_64) exhibits these
>> symptoms every time one(at least) peers goes out of
>> the rest reach.
>>
>> maybe @devel can comment?
>>
>> many thanks, L.
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> <mailto:Gluster-users at gluster.org>
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170904/e7d7bfed/attachment.html>
More information about the Gluster-users
mailing list