<div dir="ltr">Hi Rusty,<div><br></div><div>Sorry for the delay getting back to you. I had a quick look at the rebalance logs - it looks like the estimates are based on the time taken to rebalance the smaller files.</div><div><br></div><div>We do have a scripting option where we can use virtual xattrs to trigger file migration from a mount point. That would speed things up.</div><div><br></div><div><br></div><div>Regards,</div><div>Nithya</div></div><div class="gmail_extra"><br><div class="gmail_quote">On 28 July 2018 at 07:11, Rusty Bower <span dir="ltr"><<a href="mailto:rusty@rustybower.com" target="_blank">rusty@rustybower.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Just wanted to ping this to see if you guys had any thoughts, or other scripts I can run for this stuff. It's still predicting another 90 days to rebalance this, and performance is basically garbage while it rebalances.<span class="HOEnZb"><font color="#888888"><div><br></div><div>Rusty</div></font></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 23, 2018 at 10:19 AM, Rusty Bower <span dir="ltr"><<a href="mailto:rusty@rustybower.com" target="_blank">rusty@rustybower.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">datanode03 is the newest brick<div><br><div>the bricks had gotten pretty full, which I think might be part of the issue:</div><div>- datanode01 /dev/sda1 51T 48T 3.3T 94% /mnt/data</div><div>- datanode02 /dev/sda1 51T 48T 3.4T 94% /mnt/data</div><div><div>- datanode03 /dev/md0 128T 4.6T 123T 4% /mnt/data</div><div><br></div><div>each of the bricks are on a completely separate disk from the OS</div><div><br></div><div>I'll shoot you the log files offline :)</div></div></div><div><br></div><div>Thanks!</div><span class="m_-2879606836176594600HOEnZb"><font color="#888888"><div>Rusty</div></font></span></div><div class="m_-2879606836176594600HOEnZb"><div class="m_-2879606836176594600h5"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 23, 2018 at 3:12 AM, Nithya Balachandran <span dir="ltr"><<a href="mailto:nbalacha@redhat.com" target="_blank">nbalacha@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi Rusty,<div><br></div><div>Sorry I took so long to get back to you.</div><div><br></div><div>Which is the newly added brick? <span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">I see </span><span style="text-decoration-style:initial;text-decoration-color:initial;font-size:12.8px;background-color:rgb(255,255,255);float:none;display:inline">datanode02 has not picked up any files for migration which is odd.</span></div><div><span style="text-decoration-style:initial;text-decoration-color:initial;font-size:12.8px;background-color:rgb(255,255,255);float:none;display:inline">How full are the individual bricks (df -h ) output.</span></div><div><span style="font-size:12.8px">Is each of your bricks in a separate partition?</span><br></div><div>Can you send me the rebalance logs from all 3 nodes (offline if you prefer)?</div><div><br></div><div>We can try using scripts to speed up the rebalance if you prefer.</div><div><br></div><div>Regards,</div><div>Nithya</div><div><br></div><div><br></div></div><div class="m_-2879606836176594600m_7048199701035966160HOEnZb"><div class="m_-2879606836176594600m_7048199701035966160h5"><div class="gmail_extra"><br><div class="gmail_quote">On 16 July 2018 at 22:06, Rusty Bower <span dir="ltr"><<a href="mailto:rusty@rustybower.com" target="_blank">rusty@rustybower.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Thanks for the reply Nithya.<div><br></div><div>1. glusterfs 4.1.1</div><div><br></div><div>2. Volume Name: data</div><div>Type: Distribute</div><div>Volume ID: 294d95ce-0ff3-4df9-bd8c-a52fc5<wbr>0442ba</div><div>Status: Started</div><div>Snapshot Count: 0</div><div>Number of Bricks: 3</div><div>Transport-type: tcp</div><div>Bricks:</div><div>Brick1: datanode01:/mnt/data/bricks/da<wbr>ta</div><div>Brick2: datanode02:/mnt/data/bricks/da<wbr>ta</div><div>Brick3: datanode03:/mnt/data/bricks/da<wbr>ta</div><div>Options Reconfigured:</div><div>performance.readdir-ahead: on</div><div><br></div><div>3.</div><div><div> Node Rebalanced-files size scanned failures skipped status run time in h:m:s</div><div> --------- ----------- ----------- ----------- ----------- ----------- ------------ --------------</div><div> localhost 36822 11.3GB 50715 0 0 in progress 26:46:17</div><div> datanode02 0 0Bytes 2852 0 0 in progress 26:46:16</div><div> datanode03 3128 513.7MB 11442 0 3128 in progress 26:46:17</div><span><div>Estimated time left for rebalance to complete : > 2 months. Please try again later.</div></span><div>volume rebalance: data: success</div></div><div><br></div><div>4. Directory structure is basically an rsync backup of some old systems as well as all of my personal media. I can elaborate more, but it's a pretty standard filesystem.</div><div><br></div><div>5. In some folders there might be up to like 12-15 levels of directories (especially the backups)</div><div><br></div><div>6. I'm honestly not sure, I can try to scrounge this number up</div><div><br></div><div>7. My guess would be > 100k</div><div><br></div><div>8. Most files are pretty large (media files), but there's a lot of small files (metadata and configuration files) as well</div><div><br></div><div>I've also appended a (moderately sanitized)
<span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">snippet of the </span>rebalance log (let me know if you need more)</div><div><br></div><div><div>[2018-07-16 17:37:59.979003] I [MSGID: 0] [dht-rebalance.c:1799:dht_migr<wbr>ate_file] 0-data-dht: destination for file - /this/is/a/file/path/that/exis<wbr>ts/wz/wz/Npc.wz/2040036.img.xm<wbr>l is changed to - data-client-2</div><div>[2018-07-16 17:38:00.004262] I [MSGID: 109022] [dht-rebalance.c:2274:dht_migr<wbr>ate_file] 0-data-dht: completed migration of /this/is/a/file/path/that/exis<wbr>ts/wz/wz/Npc.wz/2112002.img.xm<wbr>l from subvolume data-client-0 to data-client-2</div><div>[2018-07-16 17:38:00.725582] I [dht-rebalance.c:4982:gf_defra<wbr>g_get_estimates_based_on_size] 0-glusterfs: TIME: (size) total_processed=43108305980 tmp_cnt = 55419279917056,rate_processed=<wbr>446597.869797, elapsed = 96526.000000</div><div>[2018-07-16 17:38:00.725641] I [dht-rebalance.c:5130:gf_defra<wbr>g_status_get] 0-glusterfs: TIME: Estimated total time to complete (size)= 124092127 seconds, seconds left = 123995601</div><div>[2018-07-16 17:38:00.725709] I [MSGID: 109028] [dht-rebalance.c:5210:gf_defra<wbr>g_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 96526.00 secs</div><div>[2018-07-16 17:38:00.725738] I [MSGID: 109028] [dht-rebalance.c:5214:gf_defra<wbr>g_status_get] 0-glusterfs: Files migrated: 36876, size: 12270259289, lookups: 50715, failures: 0, skipped: 0</div><div>[2018-07-16 17:38:02.769121] I [dht-rebalance.c:4982:gf_defra<wbr>g_get_estimates_based_on_size] 0-glusterfs: TIME: (size) total_processed=43108305980 tmp_cnt = 55419279917056,rate_processed=<wbr>446588.616567, elapsed = 96528.000000</div><div>[2018-07-16 17:38:02.769207] I [dht-rebalance.c:5130:gf_defra<wbr>g_status_get] 0-glusterfs: TIME: Estimated total time to complete (size)= 124094698 seconds, seconds left = 123998170</div><div>[2018-07-16 17:38:02.769263] I [MSGID: 109028] [dht-rebalance.c:5210:gf_defra<wbr>g_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 96528.00 secs</div><div>[2018-07-16 17:38:02.769286] I [MSGID: 109028] [dht-rebalance.c:5214:gf_defra<wbr>g_status_get] 0-glusterfs: Files migrated: 36876, size: 12270259289, lookups: 50715, failures: 0, skipped: 0</div><div>[2018-07-16 17:38:03.410469] I [dht-rebalance.c:1645:dht_migr<wbr>ate_file] 0-data-dht: /this/is/a/file/path/that/exis<wbr>ts/wz/wz/Npc.wz/9201002.img.xm<wbr>l: attempting to move from data-client-0 to data-client-2</div><div>[2018-07-16 17:38:03.416127] I [MSGID: 109022] [dht-rebalance.c:2274:dht_migr<wbr>ate_file] 0-data-dht: completed migration of /this/is/a/file/path/that/exis<wbr>ts/wz/wz/Npc.wz/2040036.img.xm<wbr>l from subvolume data-client-0 to data-client-2</div><div>[2018-07-16 17:38:04.738885] I [dht-rebalance.c:1645:dht_migr<wbr>ate_file] 0-data-dht: /this/is/a/file/path/that/exis<wbr>ts/wz/wz/Npc.wz/9110012.img.xm<wbr>l: attempting to move from data-client-0 to data-client-2</div><div>[2018-07-16 17:38:04.745722] I [MSGID: 109022] [dht-rebalance.c:2274:dht_migr<wbr>ate_file] 0-data-dht: completed migration of /this/is/a/file/path/that/exis<wbr>ts/wz/wz/Npc.wz/9201002.img.xm<wbr>l from subvolume data-client-0 to data-client-2</div><div>[2018-07-16 17:38:04.812368] I [dht-rebalance.c:4982:gf_defra<wbr>g_get_estimates_based_on_size] 0-glusterfs: TIME: (size) total_processed=43108308134 tmp_cnt = 55419279917056,rate_processed=<wbr>446579.386035, elapsed = 96530.000000</div><div>[2018-07-16 17:38:04.812417] I [dht-rebalance.c:5130:gf_defra<wbr>g_status_get] 0-glusterfs: TIME: Estimated total time to complete (size)= 124097263 seconds, seconds left = 124000733</div><div>[2018-07-16 17:38:04.812465] I [MSGID: 109028] [dht-rebalance.c:5210:gf_defra<wbr>g_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 96530.00 secs</div><div>[2018-07-16 17:38:04.812489] I [MSGID: 109028] [dht-rebalance.c:5214:gf_defra<wbr>g_status_get] 0-glusterfs: Files migrated: 36877, size: 12270261443, lookups: 50715, failures: 0, skipped: 0</div><div>[2018-07-16 17:38:04.992413] I [dht-rebalance.c:1645:dht_migr<wbr>ate_file] 0-data-dht: /this/is/a/file/path/that/exis<wbr>ts/wz/wz/Npc.wz/2050000.img.xm<wbr>l: attempting to move from data-client-0 to data-client-2</div><div>[2018-07-16 17:38:04.994122] I [MSGID: 109022] [dht-rebalance.c:2274:dht_migr<wbr>ate_file] 0-data-dht: completed migration of /this/is/a/file/path/that/exis<wbr>ts/wz/wz/Npc.wz/9110012.img.xm<wbr>l from subvolume data-client-0 to data-client-2</div><div>[2018-07-16 17:38:06.855618] I [dht-rebalance.c:4982:gf_defra<wbr>g_get_estimates_based_on_size] 0-glusterfs: TIME: (size) total_processed=43108318798 tmp_cnt = 55419279917056,rate_processed=<wbr>446570.244043, elapsed = 96532.000000</div><div>[2018-07-16 17:38:06.855719] I [dht-rebalance.c:5130:gf_defra<wbr>g_status_get] 0-glusterfs: TIME: Estimated total time to complete (size)= 124099804 seconds, seconds left = 124003272</div><div>[2018-07-16 17:38:06.855770] I [MSGID: 109028] [dht-rebalance.c:5210:gf_defra<wbr>g_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 96532.00 secs</div><div>[2018-07-16 17:38:06.855793] I [MSGID: 109028] [dht-rebalance.c:5214:gf_defra<wbr>g_status_get] 0-glusterfs: Files migrated: 36879, size: 12270266602, lookups: 50715, failures: 0, skipped: 0</div><div>[2018-07-16 17:38:08.511064] I [dht-rebalance.c:1645:dht_migr<wbr>ate_file] 0-data-dht: /this/is/a/file/path/that/exis<wbr>ts/wz/wz/Npc.wz/9201055.img.xm<wbr>l: attempting to move from data-client-0 to data-client-2</div><div>[2018-07-16 17:38:08.533029] I [MSGID: 109022] [dht-rebalance.c:2274:dht_migr<wbr>ate_file] 0-data-dht: completed migration of /this/is/a/file/path/that/exis<wbr>ts/wz/wz/Npc.wz/2050000.img.xm<wbr>l from subvolume data-client-0 to data-client-2</div><div>[2018-07-16 17:38:08.899708] I [dht-rebalance.c:4982:gf_defra<wbr>g_get_estimates_based_on_size] 0-glusterfs: TIME: (size) total_processed=43108318798 tmp_cnt = 55419279917056,rate_processed=<wbr>446560.991961, elapsed = 96534.000000</div><div>[2018-07-16 17:38:08.899791] I [dht-rebalance.c:5130:gf_defra<wbr>g_status_get] 0-glusterfs: TIME: Estimated total time to complete (size)= 124102375 seconds, seconds left = 124005841</div><div>[2018-07-16 17:38:08.899842] I [MSGID: 109028] [dht-rebalance.c:5210:gf_defra<wbr>g_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 96534.00 secs</div><div>[2018-07-16 17:38:08.899865] I [MSGID: 109028] [dht-rebalance.c:5214:gf_defra<wbr>g_status_get] 0-glusterfs: Files migrated: 36879, size: 12270266602, lookups: 50715, failures: 0, skipped: 0</div></div><div><br></div></div><div class="m_-2879606836176594600m_7048199701035966160m_2232620498796572015HOEnZb"><div class="m_-2879606836176594600m_7048199701035966160m_2232620498796572015h5"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 16, 2018 at 7:37 AM, Nithya Balachandran <span dir="ltr"><<a href="mailto:nbalacha@redhat.com" target="_blank">nbalacha@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">If possible, please send the rebalance logs as well.<div><div class="m_-2879606836176594600m_7048199701035966160m_2232620498796572015m_1591340225502248373h5"><br><div class="gmail_extra"><br><div class="gmail_quote">On 16 July 2018 at 10:14, Nithya Balachandran <span dir="ltr"><<a href="mailto:nbalacha@redhat.com" target="_blank">nbalacha@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi Rusty,<div><br></div><div>We need the following information:</div><div><ol><li>The exact gluster version you are running<br></li><li>gluster volume info <volname><br></li><li>gluster rebalance status<br></li><li>Information on the directory structure and file locations on your volume. <br></li><li>How many levels of directories<br></li><li>How many files and directories in each level<br></li><li>How many directories and files in total (a rough estimate)<br></li><li>Average file size<br></li></ol><div>Please note that having a rebalance running in the background should not affect your volume access in any way. However I would like to know why only 6000 files have been scanned in 6 hours.</div></div><div><br></div><div>Regards,</div><div>Nithya</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="m_-2879606836176594600m_7048199701035966160m_2232620498796572015m_1591340225502248373m_1938400960318977386h5">On 16 July 2018 at 06:13, Rusty Bower <span dir="ltr"><<a href="mailto:rusty@rustybower.com" target="_blank">rusty@rustybower.com</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="m_-2879606836176594600m_7048199701035966160m_2232620498796572015m_1591340225502248373m_1938400960318977386h5"><div dir="ltr">
<span style="font-size:12.8px;text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Hey folks,</span><div style="font-size:12.8px;text-decoration-style:initial;text-decoration-color:initial"><br></div><div style="font-size:12.8px;text-decoration-style:initial;text-decoration-color:initial">I just added a new brick to my existing gluster volume, but<span> </span><b>gluster volume rebalance data status</b> is telling me the following: Estimated time left for rebalance to complete : > 2 months. Please try again later.</div><div style="font-size:12.8px;text-decoration-style:initial;text-decoration-color:initial"><br></div><div style="font-size:12.8px;text-decoration-style:initial;text-decoration-color:initial">I already did a fix-mapping, but this thing is absolutely crawling trying to rebalance everything (last estimate was ~40 years)</div><div style="font-size:12.8px;text-decoration-style:initial;text-decoration-color:initial"><br></div><div style="font-size:12.8px;text-decoration-style:initial;text-decoration-color:initial">Any thoughts on if this is a bug, or ways to speed this up? It's taking ~6 hours to scan 6000 files, which seems unreasonably slow.</div><div style="font-size:12.8px;text-decoration-style:initial;text-decoration-color:initial"><br></div><div style="font-size:12.8px;text-decoration-style:initial;text-decoration-color:initial">Thanks</div><span class="m_-2879606836176594600m_7048199701035966160m_2232620498796572015m_1591340225502248373m_1938400960318977386m_-2558423889939789208HOEnZb"><font color="#888888"><div style="font-size:12.8px;text-decoration-style:initial;text-decoration-color:initial">Rusty</div></font></span></div>
<br></div></div>______________________________<wbr>_________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mail<wbr>man/listinfo/gluster-users</a><br></blockquote></div><br></div>
</blockquote></div><br></div></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>