[Gluster-users] 4 node cluster (best performance + redundancy setup?)

K. de Jong kees.dejong+lst at neobits.nl
Thu Aug 20 21:23:03 UTC 2020


On Thu, 2020-08-13 at 05:49 -0400, Ashish Pandey wrote:
> > With 4 nodes, yes it is possible to use disperse volume.
> > Redundancy count 2 is not the best but most often used as far as my
> > interaction with users.
> > disperse volume with 4 bricks is also possible but it might not be a
> > best configuration.
> > I would suggest to have 6 bricks and 4 +2 configuration
> > where 4 - Data bricks
> > and 2 - Redundant bricks, in other way maximum number of brick which
> > can go bad while you can still use disperse volume.
> >
> > If you have number of disks on 4 nodes, you can create the 4 +2
> > disperse volume in different way while maintaining the requirenment
> > of EC (disperse volume)

Thank you for your reply. I finally received my 4th disk and I started 
to experiment with different modes.

But it seems like I can't do much with 4 bricks (and using them all). 
My idea was to have a 3+1 setup. So that one node (brick) can fail and 
everything still works without loosing the minimum quorum of 3.

But using disperse with redundancy doesn't accept this. At least one 
needs to be set for redundancy. But then the RMW (Read-Modify-Write) 
cycle is not efficient; 512 * (4-1) = 1536 bytes. Setting 2 disks for 
redundancy is not recommended in terms of split-brain scenarios. An 
uneven number needs to be configured, i.e. nog 2 or 4.

A replica set of 4 is also not allowed, since there has to be a 
majority in the quorum. So, an uneven number is required, which is not 
4. Using arbiters makes no difference in this context (of course).

How would I best achieve a 3+1 setup? Because to maintain a running 
system without split-brain, I need at least 3 nodes. With 4, one should 
be able to fail. But the modes I've explored here do not seem to 
support that. So maybe there is an option to have a disk in standby?

Performance and disk efficiency are of course always nice too. But I'm 
wondering now if 4 disks is even possible at all.

On do, aug 13, 2020 at 05:49, Ashish Pandey <aspandey at redhat.com> wrote:
> 
> 
> *From:*"K. de Jong" <kees.dejong+lst at neobits.nl>
> *To:*gluster-users at gluster.org
> *Sent:*Thursday, August 13, 2020 11:43:03 AM
> *Subject:*[Gluster-users] 4 node cluster (best performance + 
> redundancy        setup?)
> 
> I posted something in the subreddit [1], but I saw the suggestion
> elsewhere that the mailinglist is more active.
> 
> I've been reading the docs. And from this [2] overview the distributed
> replicated [3] and dispersed + redundancy [4] sound the most
> interesting.
> 
> Each node (Raspberry Pi 4, 2x 8GB and 2x 4GB version) has a 4TB HD
> disk attached via a docking station. I'm still waiting for the 4th
> Raspberry Pi, so I can't really experiment with the intended setup. 
> But
> the setup of 2 replicas and 1 arbiter was quite disappointing. I got
> between 6MB/s and 60 MB/s, depending on the test (I did a broad range
> of tests with bonnie++ and simply dd). Without GlusterFS a simple dd 
> of
> a 1GB file is about 100+ MB/s. 100MB/s is okay for this cluster.
> 
> My goal is the following:
> * Run a HA environment with Pacemaker (services like Nextcloud,
> Dovecot, Apache).
> * One node should be able to fail without downtime.
> * Performance and storage efficiency should be reasonable with the
> given hardware. So with that I mean, when everything is a replica then
> storage is stuck at 4TB. And I would prefer to have some more than 
> that
> limitation, but with redundancy.
> 
> However, when reading the docs about disperse, I see some interesting
> points. A big pro is "providing space-efficient protection against 
> disk
> or server failures". But the following is interesting as well: "The
> total number of bricks must be greater than 2 * redundancy". So, I 
> want
> the cluster to be available when one node fails. And be able to
> recreate the data on a new disk, on that forth node. I also read about
> the RMW efficiency, I guess 2 sets of 2 is the only thing that will
> work with that performance and disk efficiency in mind. Because 1
> redundancy would mess up the RMW cycle.
> 
> My questions:
> * With 4 nodes; is it possible to use disperse and redundancy? And is 
> a
> redundancy count of 2 the best (and only) choice when dealing with 4
> disks?
> 
> With 4 nodes, yes it is possible to use disperse volume.
> Redundancy count 2 is not the best but most often used as far as my 
> interaction with users.
> disperse volume with 4 bricks is also possible but it might not be a 
> best configuration.
> I would suggest to have 6 bricks and 4 +2 configuration
> where 4 - Data bricks
> and 2 - Redundant bricks, in other way maximum number of brick which 
> can go bad while you can still use disperse volume.
> 
> If you have number of disks on 4 nodes, you can create the 4 +2 
> disperse volume in different way while maintaining the requirenment 
> of EC (disperse volume)
> 
> 
>   * The example does show a 4 node disperse command, but has as output
> `There isn't an optimal redundancy value for this configuration. Do 
> you
> want to create the volume with redundancy 1 ? (y/n)`. I'm not sure if
> it's okay to simply select 'y' as an answer. The output is a bit 
> vague,
> because it says it's not optimal, so it will be just slow, but will
> work I guess?
> 
> 
> It will not be optimal from the point of view of calculation which we 
> make.
> You want to have a best configuration where yu can have maximum 
> redundancy (failure tolerance) and also maximum storage capacity.
> In that regards, it will not be an optimal solution. Performance can 
> also be a factor.
> 
>   * The RMW (Read-Modify-Write) cycle is probably what's meant. 512 *
> (#Bricks - redundancy) would be in this case for me 512 * (4-1) = 1536
> byes, which doesn't seem optimal, because it's a weird number, it's 
> not
> a power of 2 (512, 1024, 2048, etc.). Choosing a replica of 2 would
> translate to 1024, which would seem more "okay". But I don't know for
> sure.
> 
> Yes, you are right.
> 
> * Or am I better off by simply creating 2 pairs of replicas (so no
> disperse)? So in that sense I would have 8TB available, and one node
> can fail. This would provide some read performance benefits.
> * What would be a good way to integrate this with Pacemaker? With that
> I mean, should I manage the gluster resource with Pacemaker? Or simply
> try to mount the glusterfs, if it's not available, then depending
> resources can't start anyway. So in other words, let glusterfs handle
> failover itself.
> 
> 
> gluster can handle fail over on replica or disperse level as per its 
> implementation.
> Even if you want to go for replica, it does not replica 2 does not 
> look like a best option, you should
> go for replica 3 or arbiter volume to have best fault tolerance.
> However, that will cost you a big storage capacity.
> 
> 
> Any advice/tips?
> 
> 
> 
> 
> [1]
> <https://www.reddit.com/r/gluster/comments/i8ifdd/4_node_cluster_best_performance_redundancy_setup/>
> [2]
> <https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/>
> [3]
> <https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/#creating-distributed-replicated-volumes>
> [4]
> <https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/#creating-distributed-dispersed-volumes>
> 
> ________
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
> 
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200820/20d0ef0a/attachment.html>


More information about the Gluster-users mailing list