[Gluster-users] Replica 3 scale out and ZFS bricks

Wed Sep 16 19:53:19 UTC 2020

В сряда, 16 септември 2020 г., 11:54:57 Гринуич+3, Alexander Iliev <ailiev+gluster at mamul.org> написа: 

>From what I understood, in order to be able to scale it one node at a 
time, I need to set up the initial nodes with a number of bricks that is 
a multiple of 3 (e.g., 3, 6, 9, etc. bricks). The initial cluster will 
be able to export a volume as large as the storage of a single node and 
adding one more node will grow the volume by 1/3 (assuming homogeneous 
nodes.)

    You can't add 1 node to a replica 3, so no - you won't get 1/3 with that extra node.

My plan is to use ZFS as the underlying system for the bricks. Now I'm 
wondering - if I join the disks on each node in a, say, RAIDZ2 pool and 
then create a dataset within the pool for each brick, the GlusterFS 
volume would report the volume size 3x$brick_size, because each brick 
shares the same pool and the size/free space is reported according to 
the ZFS pool size/free space.

I'm not sure about ZFS (never played with it on Linux), but in my systems I setup a Thinpool consisting on all HDDs in a striped way (when no Hardware Raid Controller is available) and then you setup thin LVs for each brick.
In thin LVM you can define Virtual Size and this size is reported as the volume size (assuming that all bricks are the same in size).If you have 1 RAIDZ2 pool per Gluster TSP node, then that pool's size is the maximum size of your volume. If you plan to use snapshots , then you should set quota on the volume to control the usage. 

How should I go about this? Should I create a ZFS pool per brick (this 
seems to have a negative impact on performance)? Should I set a quota 
for each dataset?

I would go with 1 RAIDZ2 pool with 1 dataset of type 'filesystem' per Gluster node . Quota is always good to have.

P.S.: Any reason to use ZFS ? It uses a lot of memory .

Best Regards,
Strahil Nikolov