[Gluster-users] selfheal operation takes infinite to complete

Sun Oct 21 04:35:46 UTC 2018

Hello all gluster community,

I am in a scenario unmatched for the past year of using glusterfs in a 2 
replica set on glusterfs 3.10.12 servers where they are the storage back 
of my application which saves small images into them.

Now the problem I face and unique for the time is that whenever we were 
asynced or one server went down; bringing the other one will start the 
self heal and eventually we could see the clustered volume in sync, but 
now if I run the volume heal info the list of the gfid does not even 
finish after couple of hours. if I look at the heal log I can see that 
the process is ongoing but it a very small scale and speed!

My question is how can I expect it finished and how can I speed it up there?

Here is a bit of info:

Status of volume: gv1
Gluster process                             TCP Port  RDMA Port Online  Pid
------------------------------------------------------------------------------
Brick IMG-01:/images/storage/brick1         49152     0 Y       4176
Brick IMG-02:/images/storage/brick1         49152     0 Y       4095
Self-heal Daemon on localhost               N/A       N/A Y       4067
Self-heal Daemon on IMG-01                  N/A       N/A Y       4146

Task Status of Volume gv1
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: gv2
Gluster process                             TCP Port  RDMA Port Online  Pid
------------------------------------------------------------------------------
Brick IMG-01:/data/brick2                   49153     0 Y       4185
Brick IMG-02:/data/brick2                   49153     0 Y       4104
NFS Server on localhost                     N/A       N/A N       N/A
Self-heal Daemon on localhost               N/A       N/A Y       4067
NFS Server on IMG-01                        N/A       N/A N       N/A
Self-heal Daemon on IMG-01                  N/A       N/A Y       4146

Task Status of Volume gv2
------------------------------------------------------------------------------
There are no active volume tasks

gluster> peer status
Number of Peers: 1

Hostname: IMG-01
Uuid: 5faf60fc-7f5c-4c6e-aa3f-802482391c1b
State: Peer in Cluster (Connected)

Hostname: IMG-01
Uuid: 5faf60fc-7f5c-4c6e-aa3f-802482391c1b
State: Peer in Cluster (Connected)
gluster> exit
root at NAS02:/var/log/glusterfs# gluster volume gv1 info
unrecognized word: gv1 (position 1)
root at NAS02:/var/log/glusterfs# gluster volume info

Volume Name: gv1
Type: Replicate
Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: IMG-01:/images/storage/brick1
Brick2: IMG-02:/images/storage/brick1
Options Reconfigured:
server.event-threads: 4
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.lookup-optimize: on
cluster.shd-max-threads: 4
cluster.readdir-optimize: on
performance.md-cache-timeout: 30
cluster.background-self-heal-count: 32
server.statedump-path: /tmp
performance.readdir-ahead: on
nfs.disable: true
network.inode-lru-limit: 50000
features.bitrot: off
features.scrub: Inactive
performance.cache-max-file-size: 16MB
client.event-threads: 8
cluster.eager-lock: on
cluster.self-heal-daemon: enable

Please do help me out...Thanks

-- 
Hamid Safe
www.devopt.net
+989361491768