[Gluster-users] self-heal trouble after changing arbiter brick

Karthik Subrahmanya ksubrahm at redhat.com
Fri Feb 9 06:16:02 UTC 2018


Hey,

Did the heal completed and you still have some entries pending heal?
If yes then can you provide the following informations to debug the issue.
1. Which version of gluster you are running
2. gluster volume heal <volname> info summary or gluster volume heal
<volname> info
3. getfattr -d -e hex -m . <filepath-on-brick> output of any one of the
which is pending heal from all the bricks

Regards,
Karthik

On Thu, Feb 8, 2018 at 12:48 PM, Seva Gluschenko <gvs at webkontrol.ru> wrote:

> Hi folks,
>
> I'm troubled moving an arbiter brick to another server because of I/O load
> issues. My setup is as follows:
>
> # gluster volume info
>
> Volume Name: myvol
> Type: Distributed-Replicate
> Volume ID: 43ba517a-ac09-461e-99da-a197759a7dc8
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 3 x (2 + 1) = 9
> Transport-type: tcp
> Bricks:
> Brick1: gv0:/data/glusterfs
> Brick2: gv1:/data/glusterfs
> Brick3: gv4:/data/gv01-arbiter (arbiter)
> Brick4: gv2:/data/glusterfs
> Brick5: gv3:/data/glusterfs
> Brick6: gv1:/data/gv23-arbiter (arbiter)
> Brick7: gv4:/data/glusterfs
> Brick8: gv5:/data/glusterfs
> Brick9: pluto:/var/gv45-arbiter (arbiter)
> Options Reconfigured:
> nfs.disable: on
> transport.address-family: inet
> storage.owner-gid: 1000
> storage.owner-uid: 1000
> cluster.self-heal-daemon: enable
>
> The gv23-arbiter is the brick that was recently moved from other server
> (chronos) using the following command:
>
> # gluster volume replace-brick myvol chronos:/mnt/gv23-arbiter
> gv1:/data/gv23-arbiter commit force
> volume replace-brick: success: replace-brick commit force operation
> successful
>
> It's not the first time I was moving an arbiter brick, and the heal-count
> was zero for all the bricks before the change, so I didn't expect much
> trouble then. What was probably wrong is that I then forced chronos out of
> cluster with gluster peer detach command. All since that, over the course
> of the last 3 days, I see this:
>
> # gluster volume heal myvol statistics heal-count
> Gathering count of entries to be healed on volume myvol has been successful
>
> Brick gv0:/data/glusterfs
> Number of entries: 0
>
> Brick gv1:/data/glusterfs
> Number of entries: 0
>
> Brick gv4:/data/gv01-arbiter
> Number of entries: 0
>
> Brick gv2:/data/glusterfs
> Number of entries: 64999
>
> Brick gv3:/data/glusterfs
> Number of entries: 64999
>
> Brick gv1:/data/gv23-arbiter
> Number of entries: 0
>
> Brick gv4:/data/glusterfs
> Number of entries: 0
>
> Brick gv5:/data/glusterfs
> Number of entries: 0
>
> Brick pluto:/var/gv45-arbiter
> Number of entries: 0
>
> According to the /var/log/glusterfs/glustershd.log, the self-healing is
> undergoing, so it might be worth just sit and wait, but I'm wondering why
> this 64999 heal-count persists (a limitation on counter? In fact, gv2 and
> gv3 bricks contain roughly 30 million files), and I feel bothered because
> of the following output:
>
> # gluster volume heal myvol info heal-failed
> Gathering list of heal failed entries on volume myvol has been
> unsuccessful on bricks that are down. Please check if all brick processes
> are running.
>
> I attached the chronos server back to the cluster, with no noticeable
> effect. Any comments and suggestions would be much appreciated.
>
> --
> Best Regards,
>
> Seva Gluschenko
> CTO @ http://webkontrol.ru
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180209/259dfd1c/attachment.html>


More information about the Gluster-users mailing list