[Gluster-users] selfheal operation takes infinite to complete
hsafe
hsafe at devopt.net
Sun Oct 21 04:35:46 UTC 2018
Hello all gluster community,
I am in a scenario unmatched for the past year of using glusterfs in a 2
replica set on glusterfs 3.10.12 servers where they are the storage back
of my application which saves small images into them.
Now the problem I face and unique for the time is that whenever we were
asynced or one server went down; bringing the other one will start the
self heal and eventually we could see the clustered volume in sync, but
now if I run the volume heal info the list of the gfid does not even
finish after couple of hours. if I look at the heal log I can see that
the process is ongoing but it a very small scale and speed!
My question is how can I expect it finished and how can I speed it up there?
Here is a bit of info:
Status of volume: gv1
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick IMG-01:/images/storage/brick1 49152 0 Y 4176
Brick IMG-02:/images/storage/brick1 49152 0 Y 4095
Self-heal Daemon on localhost N/A N/A Y 4067
Self-heal Daemon on IMG-01 N/A N/A Y 4146
Task Status of Volume gv1
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: gv2
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick IMG-01:/data/brick2 49153 0 Y 4185
Brick IMG-02:/data/brick2 49153 0 Y 4104
NFS Server on localhost N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 4067
NFS Server on IMG-01 N/A N/A N N/A
Self-heal Daemon on IMG-01 N/A N/A Y 4146
Task Status of Volume gv2
------------------------------------------------------------------------------
There are no active volume tasks
gluster> peer status
Number of Peers: 1
Hostname: IMG-01
Uuid: 5faf60fc-7f5c-4c6e-aa3f-802482391c1b
State: Peer in Cluster (Connected)
Hostname: IMG-01
Uuid: 5faf60fc-7f5c-4c6e-aa3f-802482391c1b
State: Peer in Cluster (Connected)
gluster> exit
root at NAS02:/var/log/glusterfs# gluster volume gv1 info
unrecognized word: gv1 (position 1)
root at NAS02:/var/log/glusterfs# gluster volume info
Volume Name: gv1
Type: Replicate
Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: IMG-01:/images/storage/brick1
Brick2: IMG-02:/images/storage/brick1
Options Reconfigured:
server.event-threads: 4
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.lookup-optimize: on
cluster.shd-max-threads: 4
cluster.readdir-optimize: on
performance.md-cache-timeout: 30
cluster.background-self-heal-count: 32
server.statedump-path: /tmp
performance.readdir-ahead: on
nfs.disable: true
network.inode-lru-limit: 50000
features.bitrot: off
features.scrub: Inactive
performance.cache-max-file-size: 16MB
client.event-threads: 8
cluster.eager-lock: on
cluster.self-heal-daemon: enable
Please do help me out...Thanks
--
Hamid Safe
www.devopt.net
+989361491768
More information about the Gluster-users
mailing list