[Gluster-users] Rebalancing after adding larger bricks

Fri Oct 14 16:54:11 UTC 2016

On 11 October 2016 at 22:32, Jackie Tung <jackie at drive.ai> wrote:

> Joe,
>
> Thanks for that, that was educational.  Gluster docs claim that since 3.7,
> DHT hash ranges are weighted based on brick sizes by default:
>
> $ gluster volume get <vol cluster.weighted-rebalance
> Option                                  Value
>
> ------                                  -----
>
> cluster.weighted-rebalance              on
>
>
> When running rebalance with force, I see this in the rebalance log:
>
> ...
> [2016-10-11 16:38:37.655144] I [MSGID: 109045]
> [dht-selfheal.c:1751:dht_fix_layout_of_directory] 0-cronut-dht: subvolume
> 10 (cronut-replicate-10): *5721127* chunks
> [2016-10-11 16:38:37.655154] I [MSGID: 109045]
> [dht-selfheal.c:1751:dht_fix_layout_of_directory] 0-cronut-dht: subvolume
> 11 (cronut-replicate-11): *7628846* chunks
> …
>
> subvolume >=11 are 8TB, subvolume <= 10 is are 6TB.
>
> Do you think it is possible to even out usage on all bricks by % utilized
> now?  This would be the case if gluster rebalanced simply by what the
> scaled DHT says, including all required data migrations?
>
>
Can you please send the following:

1. The rebalance logs (/var/log/gluster/<volname>-rebalance.log) from each
node
2. The output of the following for the root of each brick:
  getfattr -e hex -m . -d <path to brick>
3. gluster volume info
4. The version of glusterfs that you are running.
5. gluster volume rebalance <volname> status

Are the file sizes more or less the same or are there large variations in
them?

Thanks,
Nithya

> It would be preferable for us to avoid having to depend on
> cluster.min-free-disk to manage overflow later on - as this introduces one
> extra read of the link followed by the actual IOP.
>
> Thanks,
> Jackie
>
> On Oct 10, 2016, at 11:13 AM, Joe Julian <joe at julianfamily.org> wrote:
>
> I've written an example of how gluster's dht works on my blog at
> https://joejulian.name/blog/dht-misses-are-expensive/ which might make it
> clear why the end result is not what you expected.
>
> By setting cluster.min-free-disk (defaults to 10%) you can, at least,
> ensure that your new bricks are utilized as needed to prevent over filling
> your smaller bricks.
> On 10/10/2016 10:13 AM, Jackie Tung wrote:
>
> Hi,
>
> We have a 2 node, distributed replicated setup (11 bricks on each node).
> Each of these bricks are 6TB in size.
>
> node_A:/brick1 replicates node_B:/brick1
> node_A:/brick2 replicates node_B:/brick2
> node_A:/brick3 replicates node_B:/brick3
> …
> …
> node_A:/brick11 replicates node_B:/brick11
>
> We recently added 5 more bricks to make it 16 bricks on each node in
> total.  Each of these new bricks are 8TB in size.
>
> We completed a full rebalance operation (status says “completed”).
>
> However the end result is somewhat unexpected:
> */dev/sdl1 7.3T 2.2T 5.2T 29%*
> */dev/sdk1 7.3T 2.0T 5.3T 28%*
> */dev/sdj1 7.3T 2.0T 5.3T 28%*
> */dev/sdn1 7.3T 2.2T 5.2T 30%*
> */dev/sdp1 7.3T 2.2T 5.2T 30%*
> /dev/sdc1 5.5T 2.3T 3.2T 42%
> /dev/sdf1 5.5T 2.3T 3.2T 43%
> /dev/sdo1 5.5T 2.3T 3.2T 42%
> /dev/sda1 5.5T 2.3T 3.2T 43%
> /dev/sdi1 5.5T 2.3T 3.2T 42%
> /dev/sdh1 5.5T 2.3T 3.2T 43%
> /dev/sde1 5.5T 2.3T 3.2T 42%
> /dev/sdb1 5.5T 2.3T 3.2T 42%
> /dev/sdm1 5.5T 2.3T 3.2T 42%
> /dev/sdg1 5.5T 2.3T 3.2T 42%
> /dev/sdd1 5.5T 2.3T 3.2T 42%
>
> The df output in *bold* are the new 8TB drives.
> Was I wrong to expect the % usage to be roughly equal?  Is there some
> parameter I need to tweak to make rebalance account for disk sizes properly?
>
> I’m using Gluster 3.8 on Ubuntu.
>
> Thanks,
> Jackie
>
> The information in this email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful.
>
>
> _______________________________________________
> Gluster-users mailing listGluster-users at gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
> The information in this email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful.
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161014/f4f7b5bc/attachment.html>