<div id="geary-body" dir="auto"><div><div data-evo-paragraph="" class="" id="-x-evo-input-start" style="caret-color: rgb(46, 52, 54); color: rgb(46, 52, 54); font-family: monospace; font-size: 13.333333015441895px; width: 71ch;">On Thu, 2020-08-13 at 05:49 -0400, Ashish Pandey wrote:</div><blockquote type="cite" style="font-family: monospace; font-size: 13.333333015441895px;"><div data-evo-paragraph="" class="" style="width: 71ch;"><span class="-x-evo-quoted" style="-webkit-user-select: none;">&gt;&nbsp;</span>With 4 nodes, yes it is possible to use disperse volume.&nbsp;</div><div data-evo-paragraph="" class="" style="width: 71ch;"><span class="-x-evo-quoted" style="-webkit-user-select: none;">&gt;&nbsp;</span>Redundancy count 2 is not the best but most often used as far as my<span data-hidden-space=""></span><br class="-x-evo-wrap-br"><span class="-x-evo-quoted" style="-webkit-user-select: none;">&gt;&nbsp;</span>interaction with users.&nbsp;</div><div data-evo-paragraph="" class="" style="width: 71ch;"><span class="-x-evo-quoted" style="-webkit-user-select: none;">&gt;&nbsp;</span>disperse volume with 4 bricks is also possible but it might not be a<span data-hidden-space=""></span><br class="-x-evo-wrap-br"><span class="-x-evo-quoted" style="-webkit-user-select: none;">&gt;&nbsp;</span>best configuration.&nbsp;</div><div data-evo-paragraph="" class="" style="width: 71ch;"><span class="-x-evo-quoted" style="-webkit-user-select: none;">&gt;&nbsp;</span>I would suggest to have 6 bricks and 4 +2 configuration&nbsp;</div><div data-evo-paragraph="" class="" style="width: 71ch;"><span class="-x-evo-quoted" style="-webkit-user-select: none;">&gt;&nbsp;</span>where 4 - Data bricks&nbsp;</div><div data-evo-paragraph="" class="" style="width: 71ch;"><span class="-x-evo-quoted" style="-webkit-user-select: none;">&gt;&nbsp;</span>and 2 - Redundant bricks, in other way maximum number of brick which<span data-hidden-space=""></span><br class="-x-evo-wrap-br"><span class="-x-evo-quoted" style="-webkit-user-select: none;">&gt;&nbsp;</span>can go bad while you can still use disperse volume.&nbsp;</div><div data-evo-paragraph="" class="" style="width: 71ch;"><span class="-x-evo-quoted" style="-webkit-user-select: none;">&gt;&nbsp;</span><br></div><div data-evo-paragraph="" class="" style="width: 71ch;"><span class="-x-evo-quoted" style="-webkit-user-select: none;">&gt;&nbsp;</span>If you have number of disks on 4 nodes, you can create the 4 +2<span data-hidden-space=""></span><br class="-x-evo-wrap-br"><span class="-x-evo-quoted" style="-webkit-user-select: none;">&gt;&nbsp;</span>disperse volume in different way while maintaining the requirenment<span data-hidden-space=""></span><br class="-x-evo-wrap-br"><span class="-x-evo-quoted" style="-webkit-user-select: none;">&gt;&nbsp;</span>of EC (disperse volume)</div></blockquote><div data-evo-paragraph="" class="" style="caret-color: rgb(46, 52, 54); color: rgb(46, 52, 54); font-family: monospace; font-size: 13.333333015441895px; width: 71ch;"><br></div><div data-evo-paragraph="" class="" style="caret-color: rgb(46, 52, 54); color: rgb(46, 52, 54); font-family: monospace; font-size: 13.333333015441895px; width: 71ch;">Thank you for your reply. I finally received my 4th disk and I started to experiment with different modes.</div><div data-evo-paragraph="" class="" style="caret-color: rgb(46, 52, 54); color: rgb(46, 52, 54); font-family: monospace; font-size: 13.333333015441895px; width: 71ch;"><br></div><div data-evo-paragraph="" class="" style="caret-color: rgb(46, 52, 54); color: rgb(46, 52, 54); font-family: monospace; font-size: 13.333333015441895px; width: 71ch;">But it seems like I can't do much with 4 bricks (and using them all). My idea was to have a 3+1 setup. So that one node (brick) can fail and everything still works without loosing the minimum quorum of 3.</div><div data-evo-paragraph="" class="" style="caret-color: rgb(46, 52, 54); color: rgb(46, 52, 54); font-family: monospace; font-size: 13.333333015441895px; width: 71ch;"><br></div><div data-evo-paragraph="" class="" style="caret-color: rgb(46, 52, 54); color: rgb(46, 52, 54); font-family: monospace; font-size: 13.333333015441895px; width: 71ch;">But using disperse with redundancy doesn't accept this. At least one needs to be set for redundancy. But then the&nbsp;RMW (Read-Modify-Write) cycle is not efficient;&nbsp;512 * (4-1) = 1536 bytes. Setting 2 disks for redundancy is not recommended in terms of split-brain scenarios. An uneven number needs to be configured, i.e. nog 2 or 4.</div><div data-evo-paragraph="" class="" style="caret-color: rgb(46, 52, 54); color: rgb(46, 52, 54); font-family: monospace; font-size: 13.333333015441895px; width: 71ch;"><br></div><div data-evo-paragraph="" class="" style="caret-color: rgb(46, 52, 54); color: rgb(46, 52, 54); font-family: monospace; font-size: 13.333333015441895px; width: 71ch;">A replica set of 4 is also not allowed, since there has to be a majority in the quorum. So, an uneven number is required, which is not 4. Using arbiters makes no difference in this context (of course).</div><div data-evo-paragraph="" class="" style="caret-color: rgb(46, 52, 54); color: rgb(46, 52, 54); font-family: monospace; font-size: 13.333333015441895px; width: 71ch;"><br></div><div data-evo-paragraph="" class="" style="caret-color: rgb(46, 52, 54); color: rgb(46, 52, 54); font-family: monospace; font-size: 13.333333015441895px; width: 71ch;">How would I best achieve a 3+1 setup? Because to maintain a running system without split-brain, I need at least 3 nodes. With 4, one should be able to fail. But the modes I've explored here do not seem to support that. So maybe there is an option to have a disk in standby?</div><div data-evo-paragraph="" class="" style="caret-color: rgb(46, 52, 54); color: rgb(46, 52, 54); font-family: monospace; font-size: 13.333333015441895px; width: 71ch;"><br></div><div data-evo-paragraph="" class="" style="caret-color: rgb(46, 52, 54); color: rgb(46, 52, 54); font-family: monospace; font-size: 13.333333015441895px; width: 71ch;">Performance and disk efficiency are of course always nice too. But I'm wondering now if 4 disks is even possible at all.</div></div></div><div id="geary-quote" dir="auto"><br>On do, aug 13, 2020 at 05:49, Ashish Pandey &lt;aspandey@redhat.com&gt; wrote:<br><blockquote type="cite"><div style="font-family: times new roman, new york, times, serif; font-size: 12pt; color: #000000"><div><br></div><div><br></div><hr id="zwchr"><div style="color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From: </b>"K. de Jong" &lt;kees.dejong+lst@neobits.nl&gt;<br><b>To: </b>gluster-users@gluster.org<br><b>Sent: </b>Thursday, August 13, 2020 11:43:03 AM<br><b>Subject: </b>[Gluster-users] 4 node cluster (best performance + redundancy&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;setup?)<br><div><br></div><div id="geary-body" dir="auto"><div><div>I posted something in the subreddit [1], but I saw the suggestion</div><div>elsewhere that the mailinglist is more active.</div><div><br></div><div>I've been reading the docs. And from this [2] overview the distributed</div><div>replicated [3] and dispersed + redundancy [4] sound the most</div><div>interesting.</div><div><br></div><div>Each node (Raspberry Pi 4, 2x 8GB and 2x 4GB version) has a 4TB HD</div><div>disk attached via a docking station. I'm still waiting for the 4th</div><div>Raspberry Pi, so I can't really experiment with the intended setup. But</div><div>the setup of 2 replicas and 1 arbiter was quite disappointing. I got</div><div>between 6MB/s and 60 MB/s, depending on the test (I did a broad range</div><div>of tests with bonnie++ and simply dd). Without GlusterFS a simple dd of</div><div>a 1GB file is about 100+ MB/s. 100MB/s is okay for this cluster.</div><div><br></div><div>My goal is the following:</div><div>* Run a HA environment with Pacemaker (services like Nextcloud,</div><div>Dovecot, Apache).</div><div>* One node should be able to fail without downtime.</div><div>* Performance and storage efficiency should be reasonable with the</div><div>given hardware. So with that I mean, when everything is a replica then</div><div>storage is stuck at 4TB. And I would prefer to have some more than that</div><div>limitation, but with redundancy.</div><div><br></div><div>However, when reading the docs about disperse, I see some interesting</div><div>points. A big pro is "providing space-efficient protection against disk</div><div>or server failures". But the following is interesting as well: "The</div><div>total number of bricks must be greater than 2 * redundancy". So, I want</div><div>the cluster to be available when one node fails. And be able to</div><div>recreate the data on a new disk, on that forth node. I also read about</div><div>the RMW efficiency, I guess 2 sets of 2 is the only thing that will</div><div>work with that performance and disk efficiency in mind. Because 1</div><div>redundancy would mess up the RMW cycle.</div><div><br></div><div>My questions:</div><div>* With 4 nodes; is it possible to use disperse and redundancy? And is a</div><div>redundancy count of 2 the best (and only) choice when dealing with 4</div><div>disks?</div><div><br></div><div>With 4 nodes, yes it is possible to use disperse volume.<br></div><div>Redundancy count 2 is not the best but most often used as far as my interaction with users.<br></div><div>disperse volume with 4 bricks is also possible but it might not be a best configuration.</div><div>I would suggest to have 6 bricks and 4 +2 configuration<br></div><div>where 4 - Data bricks<br></div><div>and 2 - Redundant bricks, in other way maximum number of brick which can go bad while you can still use disperse volume.<br></div><div><br></div><div>If you have number of disks on 4 nodes, you can create the 4 +2 disperse volume in different way while maintaining the requirenment of EC (disperse volume)<br></div><div><br></div><div><br></div><div>&nbsp; * The example does show a 4 node disperse command, but has as output</div><div>`There isn't an optimal redundancy value for this configuration. Do you</div><div>want to create the volume with redundancy 1 ? (y/n)`. I'm not sure if</div><div>it's okay to simply select 'y' as an answer. The output is a bit vague,</div><div>because it says it's not optimal, so it will be just slow, but will</div><div>work I guess?</div><div><br></div><div><br></div><div>It will not be optimal from the point of view of calculation which we make.<br></div><div>You want to have a best configuration where yu can have maximum redundancy (failure tolerance) and also maximum storage capacity.<br></div><div>In that regards, it will not be an optimal solution. Performance can also be a factor.<br></div><div><br></div><div>&nbsp; * The RMW (Read-Modify-Write) cycle is probably what's meant. 512 *</div><div>(#Bricks - redundancy) would be in this case for me 512 * (4-1) = 1536</div><div>byes, which doesn't seem optimal, because it's a weird number, it's not</div><div>a power of 2 (512, 1024, 2048, etc.). Choosing a replica of 2 would</div><div>translate to 1024, which would seem more "okay". But I don't know for</div><div>sure.</div><div><br></div><div>Yes, you are right.<br></div><div><br></div><div>* Or am I better off by simply creating 2 pairs of replicas (so no</div><div>disperse)? So in that sense I would have 8TB available, and one node</div><div>can fail. This would provide some read performance benefits.</div><div>* What would be a good way to integrate this with Pacemaker? With that</div><div>I mean, should I manage the gluster resource with Pacemaker? Or simply</div><div>try to mount the glusterfs, if it's not available, then depending</div><div>resources can't start anyway. So in other words, let glusterfs handle</div><div>failover itself.</div><div><br></div><div><br></div><div>gluster can handle fail over on replica or disperse level as per its implementation.<br></div><div>Even if you want to go for replica, it does not replica 2 does not look like a best option, you should<br></div><div>go for replica 3 or arbiter volume to have best fault tolerance.<br></div><div>However, that will cost you a big storage capacity.<br></div><div><br></div><div><br></div><div>Any advice/tips?</div><div><br></div><div><br></div><div><br></div><div><br></div><div>[1]&nbsp;</div><div><a href="https://www.reddit.com/r/gluster/comments/i8ifdd/4_node_cluster_best_performance_redundancy_setup/" target="_blank">https://www.reddit.com/r/gluster/comments/i8ifdd/4_node_cluster_best_performance_redundancy_setup/</a><br data-mce-bogus="1"></div><div>[2]&nbsp;</div><div><a href="https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/" target="_blank">https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/</a><br data-mce-bogus="1"></div><div>[3]&nbsp;</div><div><a href="https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/#creating-distributed-replicated-volumes" target="_blank">https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/#creating-distributed-replicated-volumes</a><br data-mce-bogus="1"></div><div>[4]&nbsp;</div><div><a href="https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/#creating-distributed-dispersed-volumes" target="_blank">https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/#creating-distributed-dispersed-volumes</a><br data-mce-bogus="1"></div></div></div><br>________<br><div><br></div><br><div><br></div>Community Meeting Calendar:<br><div><br></div>Schedule -<br>Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br>Bridge: https://bluejeans.com/441850968<br><div><br></div>Gluster-users mailing list<br>Gluster-users@gluster.org<br>https://lists.gluster.org/mailman/listinfo/gluster-users<br></div><div><br></div></div></blockquote></div>