[Gluster-users] usage of harddisks: each hdd a brick? raid?

Wed Feb 6 08:47:41 UTC 2019

Hey there,
just a little update...

This week we switched from our 3 "old" gluster servers to 3 new ones,
and with that we threw some hardware at the problem...

old: 3 servers, each has 4 * 10 TB disks; each disk is used as a brick
-> 4 x 3 = 12 distribute-replicate
new: 3 servers, each has 10 * 10 TB disks; we built 2 RAID10 (6 disks
and 4 disks), each RAID10 is a brick -> we split our data into 2
volumes, 1 x 3 = 3 replicate; as filesystem we use XFS (instead of
ext4) with mount options inode64,noatime,nodiratime now.

What we've seen so far: the volumes are independent - if one volume is
under load, the other one isn't affected by that. Throughput, latency
etc. seems to be better now.

Of course you waste a lot of disk space when using RAID10 and
replicate setup: 100TB per server (so 300TB in total) result in ~50TB
volume size, but during the last year we had problems due to hard disk
errors and the resulting brick restore (reset-brick) which took very
long. Was a hard time... :-/

So our conclusion was: as the heal can be really painful, take very
long and influence performance very badly -> try to avoid heals by not
having to do "big" heals at all. That's why we chose a RAID10: under
normal circumstances (a disk failing from time to time) there may be a
RAID resync, but that may be faster and cause fewer performance issues
than having to restore a complete brick.

Or, more general: if you have big, slow disk and quite high I/O ->
think about not using single disks as bricks. If you have the hardware
(and the money), think about using RAID1 or RAID10. The smaller and/or
faster the disks are (e.g. you have a lot of 1TB SSD/NVMe), using them
as bricks might work better as (in case of disk failure) the heal
should be much faster. No information about RAID5/6 possible, wasn't
taken into consideration... just my 2 €cents from (still) a gluster
amateur :-)

Best regards,
Hubert

Am Di., 22. Jan. 2019 um 07:11 Uhr schrieb Amar Tumballi Suryanarayan
<atumball at redhat.com>:
>
>
>
> On Thu, Jan 10, 2019 at 1:56 PM Hu Bert <revirii at googlemail.com> wrote:
>>
>> Hi,
>>
>> > > We ara also using 10TB disks, heal takes 7-8 days.
>> > > You can play with "cluster.shd-max-threads" setting. It is default 1 I
>> > > think. I am using it with 4.
>> > > Below you can find more info:
>> > > https://access.redhat.com/solutions/882233
>> > cluster.shd-max-threads: 8
>> > cluster.shd-wait-qlength: 10000
>>
>> Our setup:
>> cluster.shd-max-threads: 2
>> cluster.shd-wait-qlength: 10000
>>
>> > >> Volume Name: shared
>> > >> Type: Distributed-Replicate
>> > A, you have distributed-replicated volume, but I choose only replicated
>> > (for beginning simplicity :)
>> > May be replicated volume are healing faster?
>>
>> Well, maybe our setup with 3 servers and 4 disks=bricks == 12 bricks,
>> resulting in a distributed-replicate volume (all /dev/sd{a,b,c,d}
>> identical) , isn't optimal? And it would be better to create a
>> replicate 3 volume with only 1 (big) brick per server (with 4 disks:
>> either a logical volume or sw/hw raid)?
>>
>> But it would be interesting to know if a replicate volume is healing
>> faster than a distributed-replicate volume - even if there was only 1
>> faulty brick.
>>
>
> We don't have any data point to agree to this, but it may be true. Specially, as the crawling when DHT (ie, distribute) is involved can get little slower, which means, the healing would get slower too.
>
> We are trying to experiment few performance enhancement patches (like https://review.gluster.org/20636), would be great to see how things work with newer base. Will keep the list updated about performance numbers once we have some more data on them.
>
> -Amar
>
>>
>>
>> Thx
>> Hubert
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
>
> --
> Amar Tumballi (amarts)