[Gluster-users] Rebalance taking > 2 months

Nithya Balachandran nbalacha at redhat.com
Wed Aug 1 08:17:31 UTC 2018


On 31 July 2018 at 22:17, Rusty Bower <rusty at rustybower.com> wrote:

> Is it possible to pause the rebalance to get those number? I'm hesitant to
> stop the rebalance and have to redo the entire thing again.
>
> I'm afraid not. Rebalance will start from the beginning if you do so.




> On Tue, Jul 31, 2018 at 11:40 AM, Nithya Balachandran <nbalacha at redhat.com
> > wrote:
>
>>
>>
>> On 31 July 2018 at 19:44, Rusty Bower <rusty at rustybower.com> wrote:
>>
>>> I'll figure out what hasn't been rebalanced yet and run the script.
>>>
>>> There's only a single client accessing this gluster volume, and while
>>> the rebalance is taking place, the I am only able to read/write to the
>>> volume at around 3MB/s. If I log onto one of the bricks, I can read/write
>>> to the physical volumes at speed greater than 100MB/s (which is what I
>>> would expect).
>>>
>>
>> What are the numbers when accessing the volume when rebalance is not
>> running?
>> Regards,
>> Nithya
>>
>>>
>>> Thanks!
>>> Rusty
>>>
>>> On Tue, Jul 31, 2018 at 3:28 AM, Nithya Balachandran <
>>> nbalacha at redhat.com> wrote:
>>>
>>>> Hi Rusty,
>>>>
>>>> A rebalance involves 2 steps:
>>>>
>>>>    1. Setting a new layout on a directory
>>>>    2. Migrating any files inside that directory that hash to a
>>>>    different subvol based on the new layout set in step 1.
>>>>
>>>>
>>>> A few things to keep in mind :
>>>>
>>>>    - Any new content created on this volume will currently go to the
>>>>    newly added brick.
>>>>    - Having a more equitable file distribution is beneficial but you
>>>>    might not need to do a complete rebalance to do this. You can run the
>>>>    script on  just enough directories to free up space on your older bricks.
>>>>    This should be done on bricks which contains large files to speed this up.
>>>>
>>>> Do the following on one of the server nodes:
>>>>
>>>>    - Create a tmp mount point and mount the volume using the rebalance
>>>>    volfile
>>>>    - mkdir /mnt/rebal
>>>>       - glusterfs -s localhost --volfile-id rebalance/data /mnt/rebal
>>>>    - Select a directory in the volume which contains a lot of large
>>>>    files and which has not been processed by the rebalance yet - the lower
>>>>    down in the tree the better. Check the rebalance logs to figure out which
>>>>    dirs have not been processed yet.
>>>>       - cd /mnt/rebal/<chosen_dir>
>>>>       - for dir in `find . -type d`; do echo $dir |xargs -0 -n1 -P10
>>>>       bash process_dir.sh;done
>>>>    - You can run this for different values of <chosen_dir> and on
>>>>    multiple server nodes in parallel as long as the directory trees for the
>>>>    different <chosen_dirs> don't overlap.
>>>>    - Do this for multiple directories until the disk space used
>>>>    reduces on the older bricks.
>>>>
>>>> This is a very simple script. Let me know how it works - we can always
>>>> tweak it for your particular data set.
>>>>
>>>>
>>>> >and performance is basically garbage while it rebalances
>>>> Can you provide more detail on this? What kind of effects are you
>>>> seeing?
>>>> How many clients access this volume?
>>>>
>>>>
>>>> Regards,
>>>> Nithya
>>>>
>>>> On 30 July 2018 at 22:18, Nithya Balachandran <nbalacha at redhat.com>
>>>> wrote:
>>>>
>>>>> I have not documented this yet - I will send you the steps tomorrow.
>>>>>
>>>>> Regards,
>>>>> Nithya
>>>>>
>>>>> On 30 July 2018 at 20:23, Rusty Bower <rusty at rustybower.com> wrote:
>>>>>
>>>>>> That would be awesome. Where can I find these?
>>>>>>
>>>>>> Rusty
>>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>> On Jul 30, 2018, at 03:40, Nithya Balachandran <nbalacha at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Rusty,
>>>>>>
>>>>>> Sorry for the delay getting back to you. I had a quick look at the
>>>>>> rebalance logs - it looks like the estimates are based on the time taken to
>>>>>> rebalance the smaller files.
>>>>>>
>>>>>> We do have a scripting option where we can use virtual xattrs to
>>>>>> trigger file migration from a mount point. That would speed things up.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Nithya
>>>>>>
>>>>>> On 28 July 2018 at 07:11, Rusty Bower <rusty at rustybower.com> wrote:
>>>>>>
>>>>>>> Just wanted to ping this to see if you guys had any thoughts, or
>>>>>>> other scripts I can run for this stuff. It's still predicting another 90
>>>>>>> days to rebalance this, and performance is basically garbage while it
>>>>>>> rebalances.
>>>>>>>
>>>>>>> Rusty
>>>>>>>
>>>>>>> On Mon, Jul 23, 2018 at 10:19 AM, Rusty Bower <rusty at rustybower.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> datanode03 is the newest brick
>>>>>>>>
>>>>>>>> the bricks had gotten pretty full, which I think might be part of
>>>>>>>> the issue:
>>>>>>>> - datanode01 /dev/sda1                 51T   48T  3.3T  94%
>>>>>>>> /mnt/data
>>>>>>>> - datanode02 /dev/sda1                 51T   48T  3.4T  94%
>>>>>>>> /mnt/data
>>>>>>>> - datanode03 /dev/md0                 128T  4.6T  123T   4%
>>>>>>>> /mnt/data
>>>>>>>>
>>>>>>>> each of the bricks are on a completely separate disk from the OS
>>>>>>>>
>>>>>>>> I'll shoot you the log files offline :)
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Rusty
>>>>>>>>
>>>>>>>> On Mon, Jul 23, 2018 at 3:12 AM, Nithya Balachandran <
>>>>>>>> nbalacha at redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Rusty,
>>>>>>>>>
>>>>>>>>> Sorry I took so long to get back to you.
>>>>>>>>>
>>>>>>>>> Which is the newly added brick? I see datanode02 has not picked
>>>>>>>>> up any files for migration which is odd.
>>>>>>>>> How full are the individual bricks (df -h ) output.
>>>>>>>>> Is each of your bricks in a separate partition?
>>>>>>>>> Can you send me the rebalance logs from all 3 nodes (offline if
>>>>>>>>> you prefer)?
>>>>>>>>>
>>>>>>>>> We can try using scripts to speed up the rebalance if you prefer.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Nithya
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 16 July 2018 at 22:06, Rusty Bower <rusty at rustybower.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for the reply Nithya.
>>>>>>>>>>
>>>>>>>>>> 1. glusterfs 4.1.1
>>>>>>>>>>
>>>>>>>>>> 2. Volume Name: data
>>>>>>>>>> Type: Distribute
>>>>>>>>>> Volume ID: 294d95ce-0ff3-4df9-bd8c-a52fc50442ba
>>>>>>>>>> Status: Started
>>>>>>>>>> Snapshot Count: 0
>>>>>>>>>> Number of Bricks: 3
>>>>>>>>>> Transport-type: tcp
>>>>>>>>>> Bricks:
>>>>>>>>>> Brick1: datanode01:/mnt/data/bricks/data
>>>>>>>>>> Brick2: datanode02:/mnt/data/bricks/data
>>>>>>>>>> Brick3: datanode03:/mnt/data/bricks/data
>>>>>>>>>> Options Reconfigured:
>>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>>>
>>>>>>>>>> 3.
>>>>>>>>>>                                     Node Rebalanced-files
>>>>>>>>>>   size       scanned      failures       skipped               status  run
>>>>>>>>>> time in h:m:s
>>>>>>>>>>                                ---------      -----------
>>>>>>>>>>  -----------   -----------   -----------   -----------
>>>>>>>>>>  ------------     --------------
>>>>>>>>>>                                localhost            36822
>>>>>>>>>> 11.3GB         50715             0             0          in progress
>>>>>>>>>>  26:46:17
>>>>>>>>>>                               datanode02                0
>>>>>>>>>> 0Bytes          2852             0             0          in progress
>>>>>>>>>>  26:46:16
>>>>>>>>>>                               datanode03             3128
>>>>>>>>>>  513.7MB         11442             0          3128          in progress
>>>>>>>>>>    26:46:17
>>>>>>>>>> Estimated time left for rebalance to complete : > 2 months.
>>>>>>>>>> Please try again later.
>>>>>>>>>> volume rebalance: data: success
>>>>>>>>>>
>>>>>>>>>> 4. Directory structure is basically an rsync backup of some old
>>>>>>>>>> systems as well as all of my personal media. I can elaborate more, but it's
>>>>>>>>>> a pretty standard filesystem.
>>>>>>>>>>
>>>>>>>>>> 5. In some folders there might be up to like 12-15 levels of
>>>>>>>>>> directories (especially the backups)
>>>>>>>>>>
>>>>>>>>>> 6. I'm honestly not sure, I can try to scrounge this number up
>>>>>>>>>>
>>>>>>>>>> 7. My guess would be > 100k
>>>>>>>>>>
>>>>>>>>>> 8. Most files are pretty large (media files), but there's a lot
>>>>>>>>>> of small files (metadata and configuration files) as well
>>>>>>>>>>
>>>>>>>>>> I've also appended a (moderately sanitized) snippet of the rebalance
>>>>>>>>>> log (let me know if you need more)
>>>>>>>>>>
>>>>>>>>>> [2018-07-16 17:37:59.979003] I [MSGID: 0]
>>>>>>>>>> [dht-rebalance.c:1799:dht_migrate_file] 0-data-dht: destination
>>>>>>>>>> for file - /this/is/a/file/path/that/exis
>>>>>>>>>> ts/wz/wz/Npc.wz/2040036.img.xml is changed to - data-client-2
>>>>>>>>>> [2018-07-16 17:38:00.004262] I [MSGID: 109022]
>>>>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed
>>>>>>>>>> migration of /this/is/a/file/path/that/exis
>>>>>>>>>> ts/wz/wz/Npc.wz/2112002.img.xml from subvolume data-client-0 to
>>>>>>>>>> data-client-2
>>>>>>>>>> [2018-07-16 17:38:00.725582] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size]
>>>>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108305980 tmp_cnt =
>>>>>>>>>> 55419279917056,rate_processed=446597.869797, elapsed =
>>>>>>>>>> 96526.000000
>>>>>>>>>> [2018-07-16 17:38:00.725641] I [dht-rebalance.c:5130:gf_defrag_status_get]
>>>>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124092127
>>>>>>>>>> seconds, seconds left = 123995601
>>>>>>>>>> [2018-07-16 17:38:00.725709] I [MSGID: 109028]
>>>>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs:
>>>>>>>>>> Rebalance is in progress. Time taken is 96526.00 secs
>>>>>>>>>> [2018-07-16 17:38:00.725738] I [MSGID: 109028]
>>>>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files
>>>>>>>>>> migrated: 36876, size: 12270259289, lookups: 50715, failures: 0, skipped: 0
>>>>>>>>>> [2018-07-16 17:38:02.769121] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size]
>>>>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108305980 tmp_cnt =
>>>>>>>>>> 55419279917056,rate_processed=446588.616567, elapsed =
>>>>>>>>>> 96528.000000
>>>>>>>>>> [2018-07-16 17:38:02.769207] I [dht-rebalance.c:5130:gf_defrag_status_get]
>>>>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124094698
>>>>>>>>>> seconds, seconds left = 123998170
>>>>>>>>>> [2018-07-16 17:38:02.769263] I [MSGID: 109028]
>>>>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs:
>>>>>>>>>> Rebalance is in progress. Time taken is 96528.00 secs
>>>>>>>>>> [2018-07-16 17:38:02.769286] I [MSGID: 109028]
>>>>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files
>>>>>>>>>> migrated: 36876, size: 12270259289, lookups: 50715, failures: 0, skipped: 0
>>>>>>>>>> [2018-07-16 17:38:03.410469] I [dht-rebalance.c:1645:dht_migrate_file]
>>>>>>>>>> 0-data-dht: /this/is/a/file/path/that/exis
>>>>>>>>>> ts/wz/wz/Npc.wz/9201002.img.xml: attempting to move from
>>>>>>>>>> data-client-0 to data-client-2
>>>>>>>>>> [2018-07-16 17:38:03.416127] I [MSGID: 109022]
>>>>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed
>>>>>>>>>> migration of /this/is/a/file/path/that/exis
>>>>>>>>>> ts/wz/wz/Npc.wz/2040036.img.xml from subvolume data-client-0 to
>>>>>>>>>> data-client-2
>>>>>>>>>> [2018-07-16 17:38:04.738885] I [dht-rebalance.c:1645:dht_migrate_file]
>>>>>>>>>> 0-data-dht: /this/is/a/file/path/that/exis
>>>>>>>>>> ts/wz/wz/Npc.wz/9110012.img.xml: attempting to move from
>>>>>>>>>> data-client-0 to data-client-2
>>>>>>>>>> [2018-07-16 17:38:04.745722] I [MSGID: 109022]
>>>>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed
>>>>>>>>>> migration of /this/is/a/file/path/that/exis
>>>>>>>>>> ts/wz/wz/Npc.wz/9201002.img.xml from subvolume data-client-0 to
>>>>>>>>>> data-client-2
>>>>>>>>>> [2018-07-16 17:38:04.812368] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size]
>>>>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108308134 tmp_cnt =
>>>>>>>>>> 55419279917056,rate_processed=446579.386035, elapsed =
>>>>>>>>>> 96530.000000
>>>>>>>>>> [2018-07-16 17:38:04.812417] I [dht-rebalance.c:5130:gf_defrag_status_get]
>>>>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124097263
>>>>>>>>>> seconds, seconds left = 124000733
>>>>>>>>>> [2018-07-16 17:38:04.812465] I [MSGID: 109028]
>>>>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs:
>>>>>>>>>> Rebalance is in progress. Time taken is 96530.00 secs
>>>>>>>>>> [2018-07-16 17:38:04.812489] I [MSGID: 109028]
>>>>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files
>>>>>>>>>> migrated: 36877, size: 12270261443, lookups: 50715, failures: 0, skipped: 0
>>>>>>>>>> [2018-07-16 17:38:04.992413] I [dht-rebalance.c:1645:dht_migrate_file]
>>>>>>>>>> 0-data-dht: /this/is/a/file/path/that/exis
>>>>>>>>>> ts/wz/wz/Npc.wz/2050000.img.xml: attempting to move from
>>>>>>>>>> data-client-0 to data-client-2
>>>>>>>>>> [2018-07-16 17:38:04.994122] I [MSGID: 109022]
>>>>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed
>>>>>>>>>> migration of /this/is/a/file/path/that/exis
>>>>>>>>>> ts/wz/wz/Npc.wz/9110012.img.xml from subvolume data-client-0 to
>>>>>>>>>> data-client-2
>>>>>>>>>> [2018-07-16 17:38:06.855618] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size]
>>>>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108318798 tmp_cnt =
>>>>>>>>>> 55419279917056,rate_processed=446570.244043, elapsed =
>>>>>>>>>> 96532.000000
>>>>>>>>>> [2018-07-16 17:38:06.855719] I [dht-rebalance.c:5130:gf_defrag_status_get]
>>>>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124099804
>>>>>>>>>> seconds, seconds left = 124003272
>>>>>>>>>> [2018-07-16 17:38:06.855770] I [MSGID: 109028]
>>>>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs:
>>>>>>>>>> Rebalance is in progress. Time taken is 96532.00 secs
>>>>>>>>>> [2018-07-16 17:38:06.855793] I [MSGID: 109028]
>>>>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files
>>>>>>>>>> migrated: 36879, size: 12270266602, lookups: 50715, failures: 0, skipped: 0
>>>>>>>>>> [2018-07-16 17:38:08.511064] I [dht-rebalance.c:1645:dht_migrate_file]
>>>>>>>>>> 0-data-dht: /this/is/a/file/path/that/exis
>>>>>>>>>> ts/wz/wz/Npc.wz/9201055.img.xml: attempting to move from
>>>>>>>>>> data-client-0 to data-client-2
>>>>>>>>>> [2018-07-16 17:38:08.533029] I [MSGID: 109022]
>>>>>>>>>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed
>>>>>>>>>> migration of /this/is/a/file/path/that/exis
>>>>>>>>>> ts/wz/wz/Npc.wz/2050000.img.xml from subvolume data-client-0 to
>>>>>>>>>> data-client-2
>>>>>>>>>> [2018-07-16 17:38:08.899708] I [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size]
>>>>>>>>>> 0-glusterfs: TIME: (size) total_processed=43108318798 tmp_cnt =
>>>>>>>>>> 55419279917056,rate_processed=446560.991961, elapsed =
>>>>>>>>>> 96534.000000
>>>>>>>>>> [2018-07-16 17:38:08.899791] I [dht-rebalance.c:5130:gf_defrag_status_get]
>>>>>>>>>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124102375
>>>>>>>>>> seconds, seconds left = 124005841
>>>>>>>>>> [2018-07-16 17:38:08.899842] I [MSGID: 109028]
>>>>>>>>>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs:
>>>>>>>>>> Rebalance is in progress. Time taken is 96534.00 secs
>>>>>>>>>> [2018-07-16 17:38:08.899865] I [MSGID: 109028]
>>>>>>>>>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files
>>>>>>>>>> migrated: 36879, size: 12270266602, lookups: 50715, failures: 0, skipped: 0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 16, 2018 at 7:37 AM, Nithya Balachandran <
>>>>>>>>>> nbalacha at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> If possible, please send the rebalance logs as well.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 16 July 2018 at 10:14, Nithya Balachandran <
>>>>>>>>>>> nbalacha at redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Rusty,
>>>>>>>>>>>>
>>>>>>>>>>>> We need the following information:
>>>>>>>>>>>>
>>>>>>>>>>>>    1. The exact gluster version you are running
>>>>>>>>>>>>    2. gluster volume info <volname>
>>>>>>>>>>>>    3. gluster rebalance status
>>>>>>>>>>>>    4. Information on the directory structure and file
>>>>>>>>>>>>    locations on your volume.
>>>>>>>>>>>>    5. How many levels of directories
>>>>>>>>>>>>    6. How many files and directories in each level
>>>>>>>>>>>>    7. How many directories and files in total (a rough
>>>>>>>>>>>>    estimate)
>>>>>>>>>>>>    8. Average file size
>>>>>>>>>>>>
>>>>>>>>>>>> Please note that having a rebalance running in the background
>>>>>>>>>>>> should not affect your volume access in any way. However I would like to
>>>>>>>>>>>> know why only 6000 files have been scanned in 6 hours.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Nithya
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 16 July 2018 at 06:13, Rusty Bower <rusty at rustybower.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hey folks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I just added a new brick to my existing gluster volume, but *gluster
>>>>>>>>>>>>> volume rebalance data status* is telling me the
>>>>>>>>>>>>> following: Estimated time left for rebalance to complete : > 2 months.
>>>>>>>>>>>>> Please try again later.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I already did a fix-mapping, but this thing is absolutely
>>>>>>>>>>>>> crawling trying to rebalance everything (last estimate was ~40 years)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any thoughts on if this is a bug, or ways to speed this up?
>>>>>>>>>>>>> It's taking ~6 hours to scan 6000 files, which seems unreasonably slow.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> Rusty
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180801/c35b06b7/attachment.html>


More information about the Gluster-users mailing list