[Gluster-users] Gluster rebalance taking many years

Mon Apr 30 06:56:30 UTC 2018

 I met a big problem,the cluster rebalance takes a long time after adding a
new node

gluster volume rebalance web status
                                    Node Rebalanced-files          size
  scanned      failures       skipped               status  run time in
h:m:s
                               ---------      -----------   -----------
-----------   -----------   -----------         ------------
--------------
                               localhost              900        43.5MB
     2232             0            69          in progress        0:36:49
                                gluster2             1052        39.3MB
     4393             0          1052          in progress        0:36:49
Estimated time left for rebalance to complete :     9919:44:34
volume rebalance: web: success

the rebalance log
[glusterfsd.c:2511:main] 0-/usr/sbin/glusterfs: Started running
/usr/sbin/glusterfs version 3.12.8 (args: /usr/sbin/glusterfs -s localhost
--volfile-id rebalance/web --xlator-option *dht.use-readdirp=yes
--xlator-option *dht.lookup-unhashed=yes --xlator-option
*dht.assert-no-child-down=yes --xlator-option
*replicate*.data-self-heal=off --xlator-option
*replicate*.metadata-self-heal=off --xlator-option
*replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on
--xlator-option *dht.rebalance-cmd=1 --xlator-option
*dht.node-uuid=d47ad89d-7979-4ede-9aba-e04f020bb4f0 --xlator-option
*dht.commit-hash=3610561770 --socket-file
/var/run/gluster/gluster-rebalance-bdef10eb-1c83-410c-8ad3-fe286450004b.sock
--pid-file
/var/lib/glusterd/vols/web/rebalance/d47ad89d-7979-4ede-9aba-e04f020bb4f0.pid
-l /var/log/glusterfs/web-rebalance.log)
[2018-04-30 04:20:45.100902] W [MSGID: 101002]
[options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is
deprecated, preferred is 'transport.address-family', continuing with
correction
[2018-04-30 04:20:45.103927] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2018-04-30 04:20:55.191261] E [MSGID: 109039]
[dht-common.c:3113:dht_find_local_subvol_cbk] 0-web-dht: getxattr err for
dir [No data available]
[2018-04-30 04:21:19.783469] E [MSGID: 109023]
[dht-rebalance.c:2669:gf_defrag_migrate_single_file] 0-web-dht: Migrate
file failed: /2018/02/x187f6596-36ac-45e6-bd7a-019804dfe427.jpg, lookup
failed [Stale file handle]
The message "E [MSGID: 109039]
[dht-common.c:3113:dht_find_local_subvol_cbk] 0-web-dht: getxattr err for
dir [No data available]" repeated 2 times between [2018-04-30
04:20:55.191261] and [2018-04-30 04:20:55.193615]

the gluster info
Volume Name: web
Type: Distribute
Volume ID: bdef10eb-1c83-410c-8ad3-fe286450004b
Status: Started
Snapshot Count: 0
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: gluster1:/home/export/md3/brick
Brick2: gluster1:/export/md2/brick
Brick3: gluster2:/home/export/md3/brick
Options Reconfigured:
nfs.trusted-sync: on
nfs.trusted-write: on
cluster.rebal-throttle: aggressive
features.inode-quota: off
features.quota: off
cluster.shd-wait-qlength: 1024
transport.address-family: inet
cluster.lookup-unhashed: auto
performance.cache-size: 1GB
performance.client-io-threads: on
performance.write-behind-window-size: 4MB
performance.io-thread-count: 8
performance.force-readdirp: on
performance.readdir-ahead: on
cluster.readdir-optimize: on
performance.high-prio-threads: 8
performance.flush-behind: on
performance.write-behind: on
performance.quick-read: off
performance.io-cache: on
performance.read-ahead: off
server.event-threads: 8
cluster.lookup-optimize: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: off
performance.md-cache-timeout: 60
network.inode-lru-limit: 90000
diagnostics.brick-log-level: ERROR
diagnostics.brick-sys-log-level: ERROR
diagnostics.client-log-level: ERROR
diagnostics.client-sys-log-level: ERROR
cluster.min-free-disk: 20%
cluster.self-heal-window-size: 16
cluster.self-heal-readdir-size: 1024
cluster.background-self-heal-count: 4
cluster.heal-wait-queue-length: 128
client.event-threads: 8
performance.cache-invalidation: on
nfs.disable: off
nfs.acl: off
cluster.brick-multiplex: disable
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180430/d3bae4e3/attachment.html>