[Gluster-devel] Rebalance improvement design

Tue May 5 06:48:12 UTC 2015

Comments inline

----- Original Message -----
> From: "Benjamin Turner" <bennyturns at gmail.com>
> To: "Susant Palai" <spalai at redhat.com>
> Cc: "Vijay Bellur" <vbellur at redhat.com>, "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Monday, May 4, 2015 8:58:13 PM
> Subject: Re: [Gluster-devel] Rebalance improvement design
> 
> I see:
> 
> #define GF_DECIDE_DEFRAG_THROTTLE_COUNT(throttle_count, conf) {         \
>                                                                         \
>                 throttle_count = MAX ((get_nprocs() - 4), 4);
>   \
>                                                                         \
>                 if (!strcmp (conf->dthrottle, "lazy"))                  \
>                         conf->defrag->rthcount = 1;                     \
>                                                                         \
>                 if (!strcmp (conf->dthrottle, "normal"))                \
>                         conf->defrag->rthcount = (throttle_count / 2);  \
>                                                                         \
>                 if (!strcmp (conf->dthrottle, "aggressive"))            \
>                         conf->defrag->rthcount = throttle_count;  \
> 
> So aggressive will give us the default of (20 + 16), normal is that divided
The 16 here you mentioned are sync threads that scales with the workload independent of migration. The number 20 is the no. of dedicated threads for carrying out migration. Planning to make the maximum threads allowed to be the number of processing units available or 4. [MAX (get_nprocs() , 4)]. 
> by 2, and lazy is 1, is that correct?  If so that is what I was looking to
> see.  The only other thing I can think of here is making the tunible a
> number like event threads, but I like this.  IDK if I saw it documented but
> if its not we should note this in help.
Sure will be documented.
> 
> Also to note, the old time was 98500.00 the new one is 55088.00, that is a
> 44% improvement!
> 
> -b
> 
> 
> On Mon, May 4, 2015 at 9:06 AM, Susant Palai <spalai at redhat.com> wrote:
> 
> > Ben,
> >     On no. of threads:
> >      Sent throttle patch here:http://review.gluster.org/#/c/10526/ to
> > limit thread numbers[Not merged]. The rebalance process in current model
> > spawns 20 threads and in addition to that there will be a max 16 syncop
> > threads.
> >
> >     Crash:
> >      The crash should be fixed by this:
> > http://review.gluster.org/#/c/10459/.
> >
> >      Rebalance time taken is a factor of number of files and their size.
> > If the frequency of files getting added to the global queue[on which the
> > migrator threads act] is higher, faster will be the rebalance. I guess here
> > we are seeing the effect of local crawl mostly as only 81GB is migrated out
> > of 500GB.
> >
> > Thanks,
> > Susant
> >
> > ----- Original Message -----
> > > From: "Benjamin Turner" <bennyturns at gmail.com>
> > > To: "Vijay Bellur" <vbellur at redhat.com>
> > > Cc: "Gluster Devel" <gluster-devel at gluster.org>
> > > Sent: Monday, May 4, 2015 5:18:13 PM
> > > Subject: Re: [Gluster-devel] Rebalance improvement design
> > >
> > > Thanks Vijay! I forgot to upgrade the kernel(thinp 6.6 perf bug gah)
> > before I
> > > created this data set, so its a bit smaller:
> > >
> > > total threads = 16
> > > total files = 7,060,700 (64 kb files, 100 files per dir)
> > > total data = 430.951 GB
> > > 88.26% of requested files processed, minimum is 70.00
> > > 10101.355737 sec elapsed time
> > > 698.985382 files/sec
> > > 698.985382 IOPS
> > > 43.686586 MB/sec
> > >
> > > I updated everything and ran the rebalanace on
> > > glusterfs-3.8dev-0.107.git275f724.el6.x86_64.:
> > >
> > > [root at gqas001 ~]# gluster v rebalance testvol status
> > > Node Rebalanced-files size scanned failures skipped status run time in
> > secs
> > > --------- ----------- ----------- ----------- ----------- -----------
> > > ------------ --------------
> > > localhost 1327346 81.0GB 3999140 0 0 completed 55088.00
> > > gqas013.sbu.lab.eng.bos.redhat.com 0 0Bytes 1 0 0 completed 26070.00
> > > gqas011.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 0 0 failed 0.00
> > > gqas014.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 0 0 failed 0.00
> > > gqas016.sbu.lab.eng.bos.redhat.com 1325857 80.9GB 4000865 0 0 completed
> > > 55088.00
> > > gqas015.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 0 0 failed 0.00
> > > volume rebalance: testvol: success:
> > >
> > >
> > > A couple observations:
> > >
> > > I am seeing lots of threads / processes running:
> > >
> > > [root at gqas001 ~]# ps -eLf | grep glu | wc -l
> > > 96 <- 96 gluster threads
> > > [root at gqas001 ~]# ps -eLf | grep rebal | wc -l
> > > 36 <- 36 rebal threads.
> > >
> > > Is this tunible? Is there a use case where we would need to limit this?
> > Just
> > > curious, how did we arrive at 36 rebal threads?
> > >
> > > # cat /var/log/glusterfs/testvol-rebalance.log | wc -l
> > > 4,577,583
> > > [root at gqas001 ~]# ll /var/log/glusterfs/testvol-rebalance.log -h
> > > -rw------- 1 root root 1.6G May 3 12:29
> > > /var/log/glusterfs/testvol-rebalance.log
> > >
> > > :) How big is this going to get when I do the 10-20 TB? I'll keep tabs on
> > > this, my default test setup only has:
> > >
> > > [root at gqas001 ~]# df -h
> > > Filesystem Size Used Avail Use% Mounted on
> > > /dev/mapper/vg_gqas001-lv_root 50G 4.8G 42G 11% /
> > > tmpfs 24G 0 24G 0% /dev/shm
> > > /dev/sda1 477M 65M 387M 15% /boot
> > > /dev/mapper/vg_gqas001-lv_home 385G 71M 366G 1% /home
> > > /dev/mapper/gluster_vg-lv_bricks 9.5T 219G 9.3T 3% /bricks
> > >
> > > Next run I want to fill up a 10TB cluster and double the # of bricks to
> > > simulate running out of space doubling capacity. Any other fixes or
> > changes
> > > that need to go in before I try a larger data set? Before that I may run
> > my
> > > performance regression suite against a system while a rebal is in
> > progress
> > > and check how it affects performance. I'll turn both these cases into
> > perf
> > > regression tests that I run with iozone smallfile and such, any other use
> > > cases I should add? Should I add hard / soft links / whatever else tot he
> > > data set?
> > >
> > > -b
> > >
> > >
> > > On Sun, May 3, 2015 at 11:48 AM, Vijay Bellur < vbellur at redhat.com >
> > wrote:
> > >
> > >
> > > On 05/01/2015 10:23 AM, Benjamin Turner wrote:
> > >
> > >
> > > Ok I have all my data created and I just started the rebalance. One
> > > thing to not in the client log I see the following spamming:
> > >
> > > [root at gqac006 ~]# cat /var/log/glusterfs/gluster-mount-.log | wc -l
> > > 394042
> > >
> > > [2015-05-01 00:47:55.591150] I [MSGID: 109036]
> > > [dht-common.c:6478:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht:
> > > Setting layout of
> > > /file_dstdir/
> > > gqac006.sbu.lab.eng.bos.redhat.com/thrd_05/d_001/d_000/d_004/d_006
> > > <
> > http://gqac006.sbu.lab.eng.bos.redhat.com/thrd_05/d_001/d_000/d_004/d_006
> > >
> > > with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 0 , Stop:
> > > 2141429669 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start:
> > > 2141429670 , Stop: 4294967295 ],
> > > [2015-05-01 00:47:55.596147] I
> > > [dht-selfheal.c:1587:dht_selfheal_layout_new_directory] 0-testvol-dht:
> > > chunk size = 0xffffffff / 19920276 = 0xd7
> > > [2015-05-01 00:47:55.596177] I
> > > [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-testvol-dht:
> > > assigning range size 0x7fa39fa6 to testvol-replicate-1
> > >
> > >
> > > I also noticed the same set of excessive logs in my tests. Have sent
> > across a
> > > patch [1] to address this problem.
> > >
> > > -Vijay
> > >
> > > [1] http://review.gluster.org/10281
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > >
> >
>