<html><body><div style="font-family: times new roman, new york, times, serif; font-size: 12pt; color: #000000"><div><br></div><div><br></div><hr id="zwchr"><div style="color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From: </b>"K. de Jong" &lt;kees.dejong+lst@neobits.nl&gt;<br><b>To: </b>gluster-users@gluster.org<br><b>Sent: </b>Thursday, August 13, 2020 11:43:03 AM<br><b>Subject: </b>[Gluster-users] 4 node cluster (best performance + redundancy&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;setup?)<br><div><br></div><div id="geary-body" dir="auto"><div><div>I posted something in the subreddit [1], but I saw the suggestion</div><div>elsewhere that the mailinglist is more active.</div><div><br></div><div>I've been reading the docs. And from this [2] overview the distributed</div><div>replicated [3] and dispersed + redundancy [4] sound the most</div><div>interesting.</div><div><br></div><div>Each node (Raspberry Pi 4, 2x 8GB and 2x 4GB version) has a 4TB HD</div><div>disk attached via a docking station. I'm still waiting for the 4th</div><div>Raspberry Pi, so I can't really experiment with the intended setup. But</div><div>the setup of 2 replicas and 1 arbiter was quite disappointing. I got</div><div>between 6MB/s and 60 MB/s, depending on the test (I did a broad range</div><div>of tests with bonnie++ and simply dd). Without GlusterFS a simple dd of</div><div>a 1GB file is about 100+ MB/s. 100MB/s is okay for this cluster.</div><div><br></div><div>My goal is the following:</div><div>* Run a HA environment with Pacemaker (services like Nextcloud,</div><div>Dovecot, Apache).</div><div>* One node should be able to fail without downtime.</div><div>* Performance and storage efficiency should be reasonable with the</div><div>given hardware. So with that I mean, when everything is a replica then</div><div>storage is stuck at 4TB. And I would prefer to have some more than that</div><div>limitation, but with redundancy.</div><div><br></div><div>However, when reading the docs about disperse, I see some interesting</div><div>points. A big pro is "providing space-efficient protection against disk</div><div>or server failures". But the following is interesting as well: "The</div><div>total number of bricks must be greater than 2 * redundancy". So, I want</div><div>the cluster to be available when one node fails. And be able to</div><div>recreate the data on a new disk, on that forth node. I also read about</div><div>the RMW efficiency, I guess 2 sets of 2 is the only thing that will</div><div>work with that performance and disk efficiency in mind. Because 1</div><div>redundancy would mess up the RMW cycle.</div><div><br></div><div>My questions:</div><div>* With 4 nodes; is it possible to use disperse and redundancy? And is a</div><div>redundancy count of 2 the best (and only) choice when dealing with 4</div><div>disks?</div><div><br></div><div>With 4 nodes, yes it is possible to use disperse volume.<br></div><div>Redundancy count 2 is not the best but most often used as far as my interaction with users.<br></div><div>disperse volume with 4 bricks is also possible but it might not be a best configuration.</div><div>I would suggest to have 6 bricks and 4 +2 configuration<br></div><div>where 4 - Data bricks<br></div><div>and 2 - Redundant bricks, in other way maximum number of brick which can go bad while you can still use disperse volume.<br></div><div><br></div><div>If you have number of disks on 4 nodes, you can create the 4 +2 disperse volume in different way while maintaining the requirenment of EC (disperse volume)<br></div><div><br></div><div><br></div><div>&nbsp; * The example does show a 4 node disperse command, but has as output</div><div>`There isn't an optimal redundancy value for this configuration. Do you</div><div>want to create the volume with redundancy 1 ? (y/n)`. I'm not sure if</div><div>it's okay to simply select 'y' as an answer. The output is a bit vague,</div><div>because it says it's not optimal, so it will be just slow, but will</div><div>work I guess?</div><div><br></div><div><br></div><div>It will not be optimal from the point of view of calculation which we make.<br></div><div>You want to have a best configuration where yu can have maximum redundancy (failure tolerance) and also maximum storage capacity.<br></div><div>In that regards, it will not be an optimal solution. Performance can also be a factor.<br></div><div><br></div><div>&nbsp; * The RMW (Read-Modify-Write) cycle is probably what's meant. 512 *</div><div>(#Bricks - redundancy) would be in this case for me 512 * (4-1) = 1536</div><div>byes, which doesn't seem optimal, because it's a weird number, it's not</div><div>a power of 2 (512, 1024, 2048, etc.). Choosing a replica of 2 would</div><div>translate to 1024, which would seem more "okay". But I don't know for</div><div>sure.</div><div><br></div><div>Yes, you are right.<br></div><div><br></div><div>* Or am I better off by simply creating 2 pairs of replicas (so no</div><div>disperse)? So in that sense I would have 8TB available, and one node</div><div>can fail. This would provide some read performance benefits.</div><div>* What would be a good way to integrate this with Pacemaker? With that</div><div>I mean, should I manage the gluster resource with Pacemaker? Or simply</div><div>try to mount the glusterfs, if it's not available, then depending</div><div>resources can't start anyway. So in other words, let glusterfs handle</div><div>failover itself.</div><div><br></div><div><br></div><div>gluster can handle fail over on replica or disperse level as per its implementation.<br></div><div>Even if you want to go for replica, it does not replica 2 does not look like a best option, you should<br></div><div>go for replica 3 or arbiter volume to have best fault tolerance.<br></div><div>However, that will cost you a big storage capacity.<br></div><div><br></div><div><br></div><div>Any advice/tips?</div><div><br></div><div><br></div><div><br></div><div><br></div><div>[1]&nbsp;</div><div><a href="https://www.reddit.com/r/gluster/comments/i8ifdd/4_node_cluster_best_performance_redundancy_setup/" target="_blank">https://www.reddit.com/r/gluster/comments/i8ifdd/4_node_cluster_best_performance_redundancy_setup/</a><br data-mce-bogus="1"></div><div>[2]&nbsp;</div><div><a href="https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/" target="_blank">https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/</a><br data-mce-bogus="1"></div><div>[3]&nbsp;</div><div><a href="https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/#creating-distributed-replicated-volumes" target="_blank">https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/#creating-distributed-replicated-volumes</a><br data-mce-bogus="1"></div><div>[4]&nbsp;</div><div><a href="https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/#creating-distributed-dispersed-volumes" target="_blank">https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/#creating-distributed-dispersed-volumes</a><br data-mce-bogus="1"></div></div></div><br>________<br><div><br></div><br><div><br></div>Community Meeting Calendar:<br><div><br></div>Schedule -<br>Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br>Bridge: https://bluejeans.com/441850968<br><div><br></div>Gluster-users mailing list<br>Gluster-users@gluster.org<br>https://lists.gluster.org/mailman/listinfo/gluster-users<br></div><div><br></div></div></body></html>