[Gluster-users] Gluster rebalance taking many years

Mon Apr 30 07:40:19 UTC 2018

 I cannot calculate the number of files normally

Through df -i I got the approximate number of files is  63694442

[root at CentOS-73-64-minimal ~]# df -i
Filesystem                              Inodes    IUsed      IFree IUse%
Mounted on
/dev/md2                             131981312 30901030  101080282   24% /
devtmpfs                               8192893      435    8192458    1%
/dev
tmpfs                                  8199799     8029    8191770    1%
/dev/shm
tmpfs                                  8199799     1415    8198384    1%
/run
tmpfs                                  8199799       16    8199783    1%
/sys/fs/cgroup
/dev/md3                             110067712 29199861   80867851   27%
/home
/dev/md1                                131072      363     130709    1%
/boot
gluster1:/web                       2559860992 63694442 2496166550    3%
/web
tmpfs                                  8199799        1    8199798    1%
/run/user/0


The rebalance log is in the attachment

the cluster information

gluster volume status web detail
Status of volume: web
------------------------------------------------------------
------------------
Brick                : Brick gluster1:/home/export/md3/brick
TCP Port             : 49154
RDMA Port            : 0
Online               : Y
Pid                  : 16730
File System          : ext4
Device               : /dev/md3
Mount Options        : rw,noatime,nodiratime,nobarrier,data=ordered
Inode Size           : 256
Disk Space Free      : 239.4GB
Total Disk Space     : 1.6TB
Inode Count          : 110067712
Free Inodes          : 80867992
------------------------------------------------------------
------------------
Brick                : Brick gluster1:/export/md2/brick
TCP Port             : 49155
RDMA Port            : 0
Online               : Y
Pid                  : 16758
File System          : ext4
Device               : /dev/md2
Mount Options        : rw,noatime,nodiratime,nobarrier,data=ordered
Inode Size           : 256
Disk Space Free      : 589.4GB
Total Disk Space     : 1.9TB
Inode Count          : 131981312
Free Inodes          : 101080484
------------------------------------------------------------
------------------
Brick                : Brick gluster2:/home/export/md3/brick
TCP Port             : 49152
RDMA Port            : 0
Online               : Y
Pid                  : 12556
File System          : xfs
Device               : /dev/md3
Mount Options        : rw,noatime,nodiratime,attr2,
inode64,sunit=1024,swidth=3072,noquota
Inode Size           : 256
Disk Space Free      : 10.7TB
Total Disk Space     : 10.8TB
Inode Count          : 2317811968
Free Inodes          : 2314218207

Most of the files in the cluster are pictures smaller than 1M


2018-04-30 15:16 GMT+08:00 Nithya Balachandran <nbalacha at redhat.com>:

> Hi,
>
>
> This value is an ongoing rough estimate based on the amount of data
> rebalance has migrated since it started. The values will cange as the
> rebalance progresses.
> A few questions:
>
>    1. How many files/dirs do you have on this volume?
>    2. What is the average size of the files?
>    3. What is the total size of the data on the volume?
>
>
> Can you send us the rebalance log?
>
>
> Thanks,
> Nithya
>
> On 30 April 2018 at 10:33, kiwizhang618 <kiwizhang618 at gmail.com> wrote:
>
>>  I met a big problem,the cluster rebalance takes a long time after adding
>> a new node
>>
>> gluster volume rebalance web status
>>                                     Node Rebalanced-files          size
>>     scanned      failures       skipped               status  run time in
>> h:m:s
>>                                ---------      -----------   -----------
>> -----------   -----------   -----------         ------------
>> --------------
>>                                localhost              900        43.5MB
>>        2232             0            69          in progress        0:36:49
>>                                 gluster2             1052        39.3MB
>>        4393             0          1052          in progress        0:36:49
>> Estimated time left for rebalance to complete :     9919:44:34
>> volume rebalance: web: success
>>
>> the rebalance log
>> [glusterfsd.c:2511:main] 0-/usr/sbin/glusterfs: Started running
>> /usr/sbin/glusterfs version 3.12.8 (args: /usr/sbin/glusterfs -s localhost
>> --volfile-id rebalance/web --xlator-option *dht.use-readdirp=yes
>> --xlator-option *dht.lookup-unhashed=yes --xlator-option
>> *dht.assert-no-child-down=yes --xlator-option
>> *replicate*.data-self-heal=off --xlator-option
>> *replicate*.metadata-self-heal=off --xlator-option
>> *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on
>> --xlator-option *dht.rebalance-cmd=1 --xlator-option
>> *dht.node-uuid=d47ad89d-7979-4ede-9aba-e04f020bb4f0 --xlator-option
>> *dht.commit-hash=3610561770 --socket-file /var/run/gluster/gluster-rebal
>> ance-bdef10eb-1c83-410c-8ad3-fe286450004b.sock --pid-file
>> /var/lib/glusterd/vols/web/rebalance/d47ad89d-7979-4ede-9aba-e04f020bb4f0.pid
>> -l /var/log/glusterfs/web-rebalance.log)
>> [2018-04-30 04:20:45.100902] W [MSGID: 101002]
>> [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is
>> deprecated, preferred is 'transport.address-family', continuing with
>> correction
>> [2018-04-30 04:20:45.103927] I [MSGID: 101190]
>> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with index 1
>> [2018-04-30 04:20:55.191261] E [MSGID: 109039]
>> [dht-common.c:3113:dht_find_local_subvol_cbk] 0-web-dht: getxattr err
>> for dir [No data available]
>> [2018-04-30 04:21:19.783469] E [MSGID: 109023]
>> [dht-rebalance.c:2669:gf_defrag_migrate_single_file] 0-web-dht: Migrate
>> file failed: /2018/02/x187f6596-36ac-45e6-bd7a-019804dfe427.jpg, lookup
>> failed [Stale file handle]
>> The message "E [MSGID: 109039] [dht-common.c:3113:dht_find_local_subvol_cbk]
>> 0-web-dht: getxattr err for dir [No data available]" repeated 2 times
>> between [2018-04-30 04:20:55.191261] and [2018-04-30 04:20:55.193615]
>>
>> the gluster info
>> Volume Name: web
>> Type: Distribute
>> Volume ID: bdef10eb-1c83-410c-8ad3-fe286450004b
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: gluster1:/home/export/md3/brick
>> Brick2: gluster1:/export/md2/brick
>> Brick3: gluster2:/home/export/md3/brick
>> Options Reconfigured:
>> nfs.trusted-sync: on
>> nfs.trusted-write: on
>> cluster.rebal-throttle: aggressive
>> features.inode-quota: off
>> features.quota: off
>> cluster.shd-wait-qlength: 1024
>> transport.address-family: inet
>> cluster.lookup-unhashed: auto
>> performance.cache-size: 1GB
>> performance.client-io-threads: on
>> performance.write-behind-window-size: 4MB
>> performance.io-thread-count: 8
>> performance.force-readdirp: on
>> performance.readdir-ahead: on
>> cluster.readdir-optimize: on
>> performance.high-prio-threads: 8
>> performance.flush-behind: on
>> performance.write-behind: on
>> performance.quick-read: off
>> performance.io-cache: on
>> performance.read-ahead: off
>> server.event-threads: 8
>> cluster.lookup-optimize: on
>> features.cache-invalidation: on
>> features.cache-invalidation-timeout: 600
>> performance.stat-prefetch: off
>> performance.md-cache-timeout: 60
>> network.inode-lru-limit: 90000
>> diagnostics.brick-log-level: ERROR
>> diagnostics.brick-sys-log-level: ERROR
>> diagnostics.client-log-level: ERROR
>> diagnostics.client-sys-log-level: ERROR
>> cluster.min-free-disk: 20%
>> cluster.self-heal-window-size: 16
>> cluster.self-heal-readdir-size: 1024
>> cluster.background-self-heal-count: 4
>> cluster.heal-wait-queue-length: 128
>> client.event-threads: 8
>> performance.cache-invalidation: on
>> nfs.disable: off
>> nfs.acl: off
>> cluster.brick-multiplex: disable
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180430/8ba1729c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: web-rebalance.log
Type: application/octet-stream
Size: 8306 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180430/8ba1729c/attachment.obj>