<html>
<head>
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8'>
</head>
<body>
<style>
font{
line-height: 1.6;
}
ul,ol{
padding-left: 20px;
list-style-position: inside;
}
</style>
<div style = 'font-family:"微软雅黑"; line-height:1.6;'>
<div> I met a big problem,the cluster rebalance takes a long time after adding a new node</div><div><br></div><div>gluster volume rebalance web status</div><div> Node Rebalanced-files size scanned failures skipped status run time in h:m:s</div><div> --------- ----------- ----------- ----------- ----------- ----------- ------------ --------------</div><div> localhost 900 43.5MB 2232 0 69 in progress 0:36:49</div><div> gluster2 1052 39.3MB 4393 0 1052 in progress 0:36:49</div><div>Estimated time left for rebalance to complete : 9919:44:34</div><div>volume rebalance: web: success</div><div><br></div><div>the rebalance log</div><div><div>[glusterfsd.c:2511:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.12.8 (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/web --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=1 --xlator-option *dht.node-uuid=d47ad89d-7979-4ede-9aba-e04f020bb4f0 --xlator-option *dht.commit-hash=3610561770 --socket-file /var/run/gluster/gluster-rebalance-bdef10eb-1c83-410c-8ad3-fe286450004b.sock --pid-file /var/lib/glusterd/vols/web/rebalance/d47ad89d-7979-4ede-9aba-e04f020bb4f0.pid -l /var/log/glusterfs/web-rebalance.log)</div><div>[2018-04-30 04:20:45.100902] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction</div><div>[2018-04-30 04:20:45.103927] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1</div><div>[2018-04-30 04:20:55.191261] E [MSGID: 109039] [dht-common.c:3113:dht_find_local_subvol_cbk] 0-web-dht: getxattr err for dir [No data available]</div><div>[2018-04-30 04:21:19.783469] E [MSGID: 109023] [dht-rebalance.c:2669:gf_defrag_migrate_single_file] 0-web-dht: Migrate file failed: /2018/02/x187f6596-36ac-45e6-bd7a-019804dfe427.jpg, lookup failed [Stale file handle]</div><div>The message "E [MSGID: 109039] [dht-common.c:3113:dht_find_local_subvol_cbk] 0-web-dht: getxattr err for dir [No data available]" repeated 2 times between [2018-04-30 04:20:55.191261] and [2018-04-30 04:20:55.193615]</div></div><div><br></div><div>the gluster info</div><div><div>Volume Name: web</div><div>Type: Distribute</div><div>Volume ID: bdef10eb-1c83-410c-8ad3-fe286450004b</div><div>Status: Started</div><div>Snapshot Count: 0</div><div>Number of Bricks: 3</div><div>Transport-type: tcp</div><div>Bricks:</div><div>Brick1: gluster1:/home/export/md3/brick</div><div>Brick2: gluster1:/export/md2/brick</div><div>Brick3: gluster2:/home/export/md3/brick</div><div>Options Reconfigured:</div><div>nfs.trusted-sync: on</div><div>nfs.trusted-write: on</div><div>cluster.rebal-throttle: aggressive</div><div>features.inode-quota: off</div><div>features.quota: off</div><div>cluster.shd-wait-qlength: 1024</div><div>transport.address-family: inet</div><div>cluster.lookup-unhashed: auto</div><div>performance.cache-size: 1GB</div><div>performance.client-io-threads: on</div><div>performance.write-behind-window-size: 4MB</div><div>performance.io-thread-count: 8</div><div>performance.force-readdirp: on</div><div>performance.readdir-ahead: on</div><div>cluster.readdir-optimize: on</div><div>performance.high-prio-threads: 8</div><div>performance.flush-behind: on</div><div>performance.write-behind: on</div><div>performance.quick-read: off</div><div>performance.io-cache: on</div><div>performance.read-ahead: off</div><div>server.event-threads: 8</div><div>cluster.lookup-optimize: on</div><div>features.cache-invalidation: on</div><div>features.cache-invalidation-timeout: 600</div><div>performance.stat-prefetch: off</div><div>performance.md-cache-timeout: 60</div><div>network.inode-lru-limit: 90000</div><div>diagnostics.brick-log-level: ERROR</div><div>diagnostics.brick-sys-log-level: ERROR</div><div>diagnostics.client-log-level: ERROR</div><div>diagnostics.client-sys-log-level: ERROR</div><div>cluster.min-free-disk: 20%</div><div>cluster.self-heal-window-size: 16</div><div>cluster.self-heal-readdir-size: 1024</div><div>cluster.background-self-heal-count: 4</div><div>cluster.heal-wait-queue-length: 128</div><div>client.event-threads: 8</div><div>performance.cache-invalidation: on</div><div>nfs.disable: off</div><div>nfs.acl: off</div><div>cluster.brick-multiplex: disable</div></div><div><br></div><!--😀-->
</div>
</body>
</html>