[Gluster-users] 4 node cluster (best performance + redundancy setup?)

Thu Aug 13 06:13:03 UTC 2020

I posted something in the subreddit [1], but I saw the suggestion
elsewhere that the mailinglist is more active.

I've been reading the docs. And from this [2] overview the distributed
replicated [3] and dispersed + redundancy [4] sound the most
interesting.

Each node (Raspberry Pi 4, 2x 8GB and 2x 4GB version) has a 4TB HDD
disk attached via a docking station. I'm still waiting for the 4th
Raspberry Pi, so I can't really experiment with the intended setup. But
the setup of 2 replicas and 1 arbiter was quite disappointing. I got
between 6MB/s and 60 MB/s, depending on the test (I did a broad range
of tests with bonnie++ and simply dd). Without GlusterFS a simple dd of
a 1GB file is about 100+ MB/s. 100MB/s is okay for this cluster.

My goal is the following:
* Run a HA environment with Pacemaker (services like Nextcloud,
Dovecot, Apache).
* One node should be able to fail without downtime.
* Performance and storage efficiency should be reasonable with the
given hardware. So with that I mean, when everything is a replica then
storage is stuck at 4TB. And I would prefer to have some more than that
limitation, but with redundancy.

However, when reading the docs about disperse, I see some interesting
points. A big pro is "providing space-efficient protection against disk
or server failures". But the following is interesting as well: "The
total number of bricks must be greater than 2 * redundancy". So, I want
the cluster to be available when one node fails. And be able to
recreate the data on a new disk, on that forth node. I also read about
the RMW efficiency, I guess 2 sets of 2 is the only thing that will
work with that performance and disk efficiency in mind. Because 1
redundancy would mess up the RMW cycle.

My questions:
* With 4 nodes; is it possible to use disperse and redundancy? And is a
redundancy count of 2 the best (and only) choice when dealing with 4
disks?
  * The example does show a 4 node disperse command, but has as output
`There isn't an optimal redundancy value for this configuration. Do you
want to create the volume with redundancy 1 ? (y/n)`. I'm not sure if
it's okay to simply select 'y' as an answer. The output is a bit vague,
because it says it's not optimal, so it will be just slow, but will
work I guess?
  * The RMW (Read-Modify-Write) cycle is probably what's meant. 512 *
(#Bricks - redundancy) would be in this case for me 512 * (4-1) = 1536
byes, which doesn't seem optimal, because it's a weird number, it's not
a power of 2 (512, 1024, 2048, etc.). Choosing a replica of 2 would
translate to 1024, which would seem more "okay". But I don't know for
sure.
* Or am I better off by simply creating 2 pairs of replicas (so no
disperse)? So in that sense I would have 8TB available, and one node
can fail. This would provide some read performance benefits.
* What would be a good way to integrate this with Pacemaker? With that
I mean, should I manage the gluster resource with Pacemaker? Or simply
try to mount the glusterfs, if it's not available, then depending
resources can't start anyway. So in other words, let glusterfs handle
failover itself.

Any advice/tips?

[1]
<https://www.reddit.com/r/gluster/comments/i8ifdd/4_node_cluster_best_performance_redundancy_setup/>
[2]
<https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/>
[3]
<https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/#creating-distributed-replicated-volumes>
[4]
<https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/#creating-distributed-dispersed-volumes>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200813/f3e091ba/attachment.html>