[Gluster-users] How much disk can fail after a catastrophic failure occur?

Sun Oct 20 11:29:53 UTC 2024

Gilberto,

this totally depends on your setup.

With replica 2 you always have 2 copies of the same file.
So when you add bricks to your volume you'll want to add
Server1/disco1TB-0  and Server2/disco1TB-0 as a pair.
Meaning that each file goes to 1 server to 1 disk.
Thus your system can fail each 1 disk of any pair OR 1 server and still
be up.

However I recommend not to use replica 2 as you'll get into problems
with split-brain when 1 server is down.
When it is coming back up, you might have 2 versions of the same file
and you need a strategy to figure out which one of the two copies is
the actual one.
You can however set the volume to read-only if 1 server is down, then
you cannot get any splitbrains, but this comes maybe with downtime
depending on your usecase.

Hence why you can use at least replica 2 + 1 arbiter
Arbiter will hold metadata copies of each file (so the hardware
requirement is pretty low for this server and also doesn't need huge
disks) making it easy to find the valid filecopy and heal the invalid
one. (once had a NUC as arbiter, running totally fine) [when using
arbiter, be sure to create xfs with imaxpct=75 on arbiter as the bricks
will hold metadata only not files]

If you've got enough resources for 3 servers, replica 3 is best.

When you do 
gluster v status
and you have replica 2
then the first two rows are a pair
if you have set replica 3
then the first three rows are paired and will hold copies of the same
file.

Cheers,
A.

Am Samstag, dem 19.10.2024 um 12:25 -0300 schrieb Gilberto Ferreira:
> Hi there.
> I have 2 servers with this number of disks in each side:
> 
> pve01:~# df | grep disco
> /dev/sdd          1.0T  9.4G 1015G   1% /disco1TB-0
> /dev/sdh          1.0T  9.3G 1015G   1% /disco1TB-3
> /dev/sde          1.0T  9.5G 1015G   1% /disco1TB-1
> /dev/sdf          1.0T  9.4G 1015G   1% /disco1TB-2
> /dev/sdg          2.0T   19G  2.0T   1% /disco2TB-1
> /dev/sdc          2.0T   19G  2.0T   1% /disco2TB-0
> /dev/sdj          1.0T  9.2G 1015G   1% /disco1TB-4
> 
> I have a Type: Distributed-Replicate gluster
> So my question is: how much disk can be in fail state after losing
> data or something?
> 
> Thanks in advance
> 
> ---
> 
> 
> Gilberto Nunes Ferreira
> 
>  
> 
> 
> 
> ________
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20241020/4f4c5d7e/attachment.html>