[Gluster-users] heal info OK but statistics not working
Ravishankar N
ravishankar at redhat.com
Tue Sep 5 07:40:14 UTC 2017
On 09/04/2017 07:35 PM, Atin Mukherjee wrote:
> Ravi/Karthick,
>
> If one of the self heal process is down, will the statstics heal-count
> command work?
No it doesn't seem to: glusterd stage-op phase fails because shd was
down on that node and we error out.
FWIW, the error message "Gathering crawl statistics on volume GROUP-WORK
has been unsuccessful on bricks that are down. Please check if all brick
processes are running." is incorrect and once
https://review.gluster.org/#/c/15724/ gets merged, you will get the
correct error message like so:
/
/root at vm2 glusterfs]# gluster v heal testvol statistics
Gathering crawl statistics on volume testvol has been unsuccessful:
Staging failed on vm1. Error: Self-heal daemon is not running. Check
self-heal daemon log file./
/
-Ravi
>
> On Mon, Sep 4, 2017 at 7:24 PM, lejeczek <peljasz at yahoo.co.uk
> <mailto:peljasz at yahoo.co.uk>> wrote:
>
> 1) one peer, out of four, got separated from the network, from the
> rest of the cluster.
> 2) that unavailable(while it was unavailable) peer got detached
> with "gluster peer detach" command which succeeded, so now cluster
> comprise of three peers
> 3) Self-heal daemon (for some reason) does not start(with an
> attempt to restart glusted) on the peer which probed that fourth peer.
> 4) fourth unavailable peer is still up & running but is
> inaccessible to other peers for network is disconnected,
> segmented. That peer's gluster status show peer is still in the
> cluster.
> 5) So, fourth peer's gluster(nor other processes) stack did not
> fail nor crushed, just network got, is disconnected.
> 6) peer status show ok & connected for current three peers.
>
> This is third time when it happens to me, very same way: each time
> net-disjointed peer was brought back online then statistics &
> details worked again.
>
> can you not reproduce it?
>
> $ gluster vol info QEMU-VMs
>
> Volume Name: QEMU-VMs
> Type: Replicate
> Volume ID: 8709782a-daa5-4434-a816-c4e0aef8fef2
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: 10.5.6.32:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-QEMU-VMs
> Brick2: 10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-QEMU-VMs
> Brick3: 10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-QEMU-VMs
> Options Reconfigured:
> transport.address-family: inet
> nfs.disable: on
> storage.owner-gid: 107
> storage.owner-uid: 107
> performance.readdir-ahead: on
> geo-replication.indexing: on
> geo-replication.ignore-pid-check: on
> changelog.changelog: on
>
> $ gluster vol status QEMU-VMs
> Status of volume: QEMU-VMs
> Gluster process TCP Port RDMA Port Online Pid
> ------------------------------------------------------------------------------
> Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS
> TERs/0GLUSTER-QEMU-VMs 49156 0 Y 9302
> Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS
> TERs/0GLUSTER-QEMU-VMs 49156 0 Y 7610
> Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU
> STERs/0GLUSTER-QEMU-VMs 49156 0 Y 11013
> Self-heal Daemon on localhost N/A N/A Y
> 3069276
> Self-heal Daemon on 10.5.6.32 N/A N/A Y
> 3315870
> Self-heal Daemon on 10.5.6.49 N/A N/A N
> N/A <--- HERE
> Self-heal Daemon on 10.5.6.17 N/A N/A Y 5163
>
> Task Status of Volume QEMU-VMs
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
> $ gluster vol heal QEMU-VMs statistics heal-count
> Gathering count of entries to be healed on volume QEMU-VMs has
> been unsuccessful on bricks that are down. Please check if all
> brick processes are running.
>
>
>
> On 04/09/17 11:47, Atin Mukherjee wrote:
>
> Please provide the output of gluster volume info, gluster
> volume status and gluster peer status.
>
> On Mon, Sep 4, 2017 at 4:07 PM, lejeczek <peljasz at yahoo.co.uk
> <mailto:peljasz at yahoo.co.uk> <mailto:peljasz at yahoo.co.uk
> <mailto:peljasz at yahoo.co.uk>>> wrote:
>
> hi all
>
> this:
> $ vol heal $_vol info
> outputs ok and exit code is 0
> But if I want to see statistics:
> $ gluster vol heal $_vol statistics
> Gathering crawl statistics on volume GROUP-WORK has
> been unsuccessful on bricks that are down. Please
> check if all brick processes are running.
>
> I suspect - gluster inability to cope with a situation
> where one peer(which is not even a brick for a single
> vol on the cluster!) is inaccessible to the rest of
> cluster.
> I have not played with any other variations of this
> case, eg. more than one peer goes down, etc.
> But I hope someone could try to replicate this simple
> test case.
>
> Cluster and vols, when something like this happens,
> seem accessible and as such "all" works, except when
> you want more details.
> This also fails:
> $ gluster vol status $_vol detail
> Error : Request timed out
>
> My gluster(3.10.5-1.el7.x86_64) exhibits these
> symptoms every time one(at least) peers goes out of
> the rest reach.
>
> maybe @devel can comment?
>
> many thanks, L.
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> <mailto:Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>>
> http://lists.gluster.org/mailman/listinfo/gluster-users
> <http://lists.gluster.org/mailman/listinfo/gluster-users>
> <http://lists.gluster.org/mailman/listinfo/gluster-users
> <http://lists.gluster.org/mailman/listinfo/gluster-users>>
>
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170905/47676f4a/attachment.html>
More information about the Gluster-users
mailing list