[Gluster-users] selfheal operation takes infinite to complete

John Strunk jstrunk at redhat.com
Tue Oct 23 12:23:37 UTC 2018


I'll leave it to others to help debug slow heal...

As for 'heal info' taking a long time, you can use `gluster vol heal gv1
info summmary` to just get the counts. That will probably get you the stats
you are really interested in (whether heal is progressing).

-John


On Tue, Oct 23, 2018 at 5:31 AM hsafe <hsafe at devopt.net> wrote:

> Hello all,
>
> Can somebody please respond to this? as of now if I run "gluster volume
> heal gv1 info"
>
> there is infinite number of lines of gfid which never ends...usually and
> in stable scenario this ended with some numbers and status but currently
> it never finishes...is it a bad sign ? is it a loop? are there any
> actions required to do beside gluster?
>
> Appreciate any help...
>
> On 10/21/18 8:05 AM, hsafe wrote:
> > Hello all gluster community,
> >
> > I am in a scenario unmatched for the past year of using glusterfs in a
> > 2 replica set on glusterfs 3.10.12 servers where they are the storage
> > back of my application which saves small images into them.
> >
> > Now the problem I face and unique for the time is that whenever we
> > were asynced or one server went down; bringing the other one will
> > start the self heal and eventually we could see the clustered volume
> > in sync, but now if I run the volume heal info the list of the gfid
> > does not even finish after couple of hours. if I look at the heal log
> > I can see that the process is ongoing but it a very small scale and
> > speed!
> >
> > My question is how can I expect it finished and how can I speed it up
> > there?
> >
> > Here is a bit of info:
> >
> > Status of volume: gv1
> > Gluster process                             TCP Port  RDMA Port
> > Online  Pid
> >
> ------------------------------------------------------------------------------
>
> >
> > Brick IMG-01:/images/storage/brick1         49152     0 Y 4176
> > Brick IMG-02:/images/storage/brick1         49152     0 Y 4095
> > Self-heal Daemon on localhost               N/A       N/A Y 4067
> > Self-heal Daemon on IMG-01                  N/A       N/A Y 4146
> >
> > Task Status of Volume gv1
> >
> ------------------------------------------------------------------------------
>
> >
> > There are no active volume tasks
> >
> > Status of volume: gv2
> > Gluster process                             TCP Port  RDMA Port
> > Online  Pid
> >
> ------------------------------------------------------------------------------
>
> >
> > Brick IMG-01:/data/brick2                   49153     0 Y 4185
> > Brick IMG-02:/data/brick2                   49153     0 Y 4104
> > NFS Server on localhost                     N/A       N/A N N/A
> > Self-heal Daemon on localhost               N/A       N/A Y 4067
> > NFS Server on IMG-01                        N/A       N/A N N/A
> > Self-heal Daemon on IMG-01                  N/A       N/A Y 4146
> >
> > Task Status of Volume gv2
> >
> ------------------------------------------------------------------------------
>
> >
> > There are no active volume tasks
> >
> >
> >
> > gluster> peer status
> > Number of Peers: 1
> >
> > Hostname: IMG-01
> > Uuid: 5faf60fc-7f5c-4c6e-aa3f-802482391c1b
> > State: Peer in Cluster (Connected)
> >
> > Hostname: IMG-01
> > Uuid: 5faf60fc-7f5c-4c6e-aa3f-802482391c1b
> > State: Peer in Cluster (Connected)
> > gluster> exit
> > root at NAS02:/var/log/glusterfs# gluster volume gv1 info
> > unrecognized word: gv1 (position 1)
> > root at NAS02:/var/log/glusterfs# gluster volume info
> >
> > Volume Name: gv1
> > Type: Replicate
> > Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace
> > Status: Started
> > Snapshot Count: 0
> > Number of Bricks: 1 x 2 = 2
> > Transport-type: tcp
> > Bricks:
> > Brick1: IMG-01:/images/storage/brick1
> > Brick2: IMG-02:/images/storage/brick1
> > Options Reconfigured:
> > server.event-threads: 4
> > performance.cache-invalidation: on
> > performance.stat-prefetch: on
> > features.cache-invalidation-timeout: 600
> > features.cache-invalidation: on
> > cluster.lookup-optimize: on
> > cluster.shd-max-threads: 4
> > cluster.readdir-optimize: on
> > performance.md-cache-timeout: 30
> > cluster.background-self-heal-count: 32
> > server.statedump-path: /tmp
> > performance.readdir-ahead: on
> > nfs.disable: true
> > network.inode-lru-limit: 50000
> > features.bitrot: off
> > features.scrub: Inactive
> > performance.cache-max-file-size: 16MB
> > client.event-threads: 8
> > cluster.eager-lock: on
> > cluster.self-heal-daemon: enable
> >
> >
> > Please do help me out...Thanks
> >
> >
> >
> --
> Hamid Safe
> www.devopt.net
> +989361491768
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181023/f069d5b2/attachment.html>


More information about the Gluster-users mailing list