[Gluster-users] Rebalance taking > 2 months

Rusty Bower rusty at rustybower.com
Mon Jul 16 16:36:04 UTC 2018


Thanks for the reply Nithya.

1. glusterfs 4.1.1

2. Volume Name: data
Type: Distribute
Volume ID: 294d95ce-0ff3-4df9-bd8c-a52fc50442ba
Status: Started
Snapshot Count: 0
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: datanode01:/mnt/data/bricks/data
Brick2: datanode02:/mnt/data/bricks/data
Brick3: datanode03:/mnt/data/bricks/data
Options Reconfigured:
performance.readdir-ahead: on

3.
                                    Node Rebalanced-files          size
   scanned      failures       skipped               status  run time in
h:m:s
                               ---------      -----------   -----------
 -----------   -----------   -----------         ------------
 --------------
                               localhost            36822        11.3GB
     50715             0             0          in progress       26:46:17
                              datanode02                0        0Bytes
      2852             0             0          in progress       26:46:16
                              datanode03             3128       513.7MB
     11442             0          3128          in progress       26:46:17
Estimated time left for rebalance to complete : > 2 months. Please try
again later.
volume rebalance: data: success

4. Directory structure is basically an rsync backup of some old systems as
well as all of my personal media. I can elaborate more, but it's a pretty
standard filesystem.

5. In some folders there might be up to like 12-15 levels of directories
(especially the backups)

6. I'm honestly not sure, I can try to scrounge this number up

7. My guess would be > 100k

8. Most files are pretty large (media files), but there's a lot of small
files (metadata and configuration files) as well

I've also appended a (moderately sanitized) snippet of the rebalance log
(let me know if you need more)

[2018-07-16 17:37:59.979003] I [MSGID: 0]
[dht-rebalance.c:1799:dht_migrate_file] 0-data-dht: destination for file -
/this/is/a/file/path/that/exists/wz/wz/Npc.wz/2040036.img.xml is changed to
- data-client-2
[2018-07-16 17:38:00.004262] I [MSGID: 109022]
[dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed migration of
/this/is/a/file/path/that/exists/wz/wz/Npc.wz/2112002.img.xml from
subvolume data-client-0 to data-client-2
[2018-07-16 17:38:00.725582] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] 0-glusterfs:
TIME: (size) total_processed=43108305980 tmp_cnt =
55419279917056,rate_processed=446597.869797, elapsed = 96526.000000
[2018-07-16 17:38:00.725641] I [dht-rebalance.c:5130:gf_defrag_status_get]
0-glusterfs: TIME: Estimated total time to complete (size)= 124092127
seconds, seconds left = 123995601
[2018-07-16 17:38:00.725709] I [MSGID: 109028]
[dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance is in
progress. Time taken is 96526.00 secs
[2018-07-16 17:38:00.725738] I [MSGID: 109028]
[dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files migrated:
36876, size: 12270259289, lookups: 50715, failures: 0, skipped: 0
[2018-07-16 17:38:02.769121] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] 0-glusterfs:
TIME: (size) total_processed=43108305980 tmp_cnt =
55419279917056,rate_processed=446588.616567, elapsed = 96528.000000
[2018-07-16 17:38:02.769207] I [dht-rebalance.c:5130:gf_defrag_status_get]
0-glusterfs: TIME: Estimated total time to complete (size)= 124094698
seconds, seconds left = 123998170
[2018-07-16 17:38:02.769263] I [MSGID: 109028]
[dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance is in
progress. Time taken is 96528.00 secs
[2018-07-16 17:38:02.769286] I [MSGID: 109028]
[dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files migrated:
36876, size: 12270259289, lookups: 50715, failures: 0, skipped: 0
[2018-07-16 17:38:03.410469] I [dht-rebalance.c:1645:dht_migrate_file]
0-data-dht: /this/is/a/file/path/that/exists/wz/wz/Npc.wz/9201002.img.xml:
attempting to move from data-client-0 to data-client-2
[2018-07-16 17:38:03.416127] I [MSGID: 109022]
[dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed migration of
/this/is/a/file/path/that/exists/wz/wz/Npc.wz/2040036.img.xml from
subvolume data-client-0 to data-client-2
[2018-07-16 17:38:04.738885] I [dht-rebalance.c:1645:dht_migrate_file]
0-data-dht: /this/is/a/file/path/that/exists/wz/wz/Npc.wz/9110012.img.xml:
attempting to move from data-client-0 to data-client-2
[2018-07-16 17:38:04.745722] I [MSGID: 109022]
[dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed migration of
/this/is/a/file/path/that/exists/wz/wz/Npc.wz/9201002.img.xml from
subvolume data-client-0 to data-client-2
[2018-07-16 17:38:04.812368] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] 0-glusterfs:
TIME: (size) total_processed=43108308134 tmp_cnt =
55419279917056,rate_processed=446579.386035, elapsed = 96530.000000
[2018-07-16 17:38:04.812417] I [dht-rebalance.c:5130:gf_defrag_status_get]
0-glusterfs: TIME: Estimated total time to complete (size)= 124097263
seconds, seconds left = 124000733
[2018-07-16 17:38:04.812465] I [MSGID: 109028]
[dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance is in
progress. Time taken is 96530.00 secs
[2018-07-16 17:38:04.812489] I [MSGID: 109028]
[dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files migrated:
36877, size: 12270261443, lookups: 50715, failures: 0, skipped: 0
[2018-07-16 17:38:04.992413] I [dht-rebalance.c:1645:dht_migrate_file]
0-data-dht: /this/is/a/file/path/that/exists/wz/wz/Npc.wz/2050000.img.xml:
attempting to move from data-client-0 to data-client-2
[2018-07-16 17:38:04.994122] I [MSGID: 109022]
[dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed migration of
/this/is/a/file/path/that/exists/wz/wz/Npc.wz/9110012.img.xml from
subvolume data-client-0 to data-client-2
[2018-07-16 17:38:06.855618] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] 0-glusterfs:
TIME: (size) total_processed=43108318798 tmp_cnt =
55419279917056,rate_processed=446570.244043, elapsed = 96532.000000
[2018-07-16 17:38:06.855719] I [dht-rebalance.c:5130:gf_defrag_status_get]
0-glusterfs: TIME: Estimated total time to complete (size)= 124099804
seconds, seconds left = 124003272
[2018-07-16 17:38:06.855770] I [MSGID: 109028]
[dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance is in
progress. Time taken is 96532.00 secs
[2018-07-16 17:38:06.855793] I [MSGID: 109028]
[dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files migrated:
36879, size: 12270266602, lookups: 50715, failures: 0, skipped: 0
[2018-07-16 17:38:08.511064] I [dht-rebalance.c:1645:dht_migrate_file]
0-data-dht: /this/is/a/file/path/that/exists/wz/wz/Npc.wz/9201055.img.xml:
attempting to move from data-client-0 to data-client-2
[2018-07-16 17:38:08.533029] I [MSGID: 109022]
[dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed migration of
/this/is/a/file/path/that/exists/wz/wz/Npc.wz/2050000.img.xml from
subvolume data-client-0 to data-client-2
[2018-07-16 17:38:08.899708] I
[dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] 0-glusterfs:
TIME: (size) total_processed=43108318798 tmp_cnt =
55419279917056,rate_processed=446560.991961, elapsed = 96534.000000
[2018-07-16 17:38:08.899791] I [dht-rebalance.c:5130:gf_defrag_status_get]
0-glusterfs: TIME: Estimated total time to complete (size)= 124102375
seconds, seconds left = 124005841
[2018-07-16 17:38:08.899842] I [MSGID: 109028]
[dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance is in
progress. Time taken is 96534.00 secs
[2018-07-16 17:38:08.899865] I [MSGID: 109028]
[dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files migrated:
36879, size: 12270266602, lookups: 50715, failures: 0, skipped: 0


On Mon, Jul 16, 2018 at 7:37 AM, Nithya Balachandran <nbalacha at redhat.com>
wrote:

> If possible, please send the rebalance logs as well.
>
>
> On 16 July 2018 at 10:14, Nithya Balachandran <nbalacha at redhat.com> wrote:
>
>> Hi Rusty,
>>
>> We need the following information:
>>
>>    1. The exact gluster version you are running
>>    2. gluster volume info <volname>
>>    3. gluster rebalance status
>>    4. Information on the directory structure and file locations on your
>>    volume.
>>    5. How many levels of directories
>>    6. How many files and directories in each level
>>    7. How many directories and files in total (a rough estimate)
>>    8. Average file size
>>
>> Please note that having a rebalance running in the background should not
>> affect your volume access in any way. However I would like to know why only
>> 6000 files have been scanned in 6 hours.
>>
>> Regards,
>> Nithya
>>
>>
>> On 16 July 2018 at 06:13, Rusty Bower <rusty at rustybower.com> wrote:
>>
>>> Hey folks,
>>>
>>> I just added a new brick to my existing gluster volume, but *gluster
>>> volume rebalance data status* is telling me the following: Estimated
>>> time left for rebalance to complete : > 2 months. Please try again later.
>>>
>>> I already did a fix-mapping, but this thing is absolutely crawling
>>> trying to rebalance everything (last estimate was ~40 years)
>>>
>>> Any thoughts on if this is a bug, or ways to speed this up? It's taking
>>> ~6 hours to scan 6000 files, which seems unreasonably slow.
>>>
>>> Thanks
>>> Rusty
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180716/783a0b43/attachment.html>


More information about the Gluster-users mailing list