[Gluster-users] EC planning
Serkan Çoban
cobanserkan at gmail.com
Wed Oct 14 13:13:38 UTC 2015
Hi Xavier,
>I'm not sure if I understand you. Are you saying you will create two separate gluster volumes or you will add both bricks to the same distributed-dispersed volume ?
Is adding more than one brick from same host to a disperse gluster
volume recommended? I meant two different gluster volume.
If I add two bricks from same server to same dispersed volume and lets
say it is 8+1 configuration, then loosing one host will bring down the
volume right?
>One possibility is to get rid of the server RAID and use each disk as a single brick. This way you can create 26 bricks per server and assign each one to a different disperse set. A big distributed-dispersed volume balances I/O load >between bricks better. Note that RAID configurations have a reduction in the available number of IOPS. For sequential writes, this is not so bad, but if you have many clients accessing the same bricks, you will see many random ?>accesses even if clients are doing sequential writes. Caching can alleviate this, but if you want to sustain a throughput of 2-3 GB/s, caching effects are not so evident.
I can create 26 JBOD disks and use them as bricks but is this
recommended? By using 50 servers, brick count will be 1300, is this
not a problem?
Can you explain the configuration a bit more? For example by using
16+2, 26 brick per server and 54 servers total. In the end I only want
one gluster volume and protection for 2 host failure.
Also in this case disk failures will be handled by gluster I hope this
don't bring more problems. But I will also test this configuration
when I get the servers..
Serkan
On Wed, Oct 14, 2015 at 2:03 PM, Xavier Hernandez <xhernandez at datalab.es> wrote:
> Hi Serkan,
>
> On 13/10/15 15:53, Serkan Çoban wrote:
>>
>> Hi Xavier and thanks for your answers.
>>
>> Servers will have 26*8TB disks.I don't want to loose more than 2 disk
>> for raid,
>> so my options are HW RAID6 24+2 or 2 * HW RAID5 12+1,
>
>
> A RAID5 of more than 8-10 disks is normally considered unsafe because the
> probability of a second drive failure while reconstructing another failed
> drive is considerably high. The same happens with a RAID6 of more than 16-20
> disks.
>
>> in both cases I can create 2 bricks per server using LVM and use one brick
>> per server to create two distributed-disperse volumes. I will test those
>> configurations when servers arrive.
>
>
> I'm not sure if I understand you. Are you saying you will create two
> separate gluster volumes or you will add both bricks to the same
> distributed-dispersed volume ?
>
>>
>> I can go with 8+1 or 16+2, will make tests when servers arrive. But 8+2
>> will
>> be too much, I lost nearly %25 space in this case.
>>
>> For the client count, this cluster will get backups from hadoop nodes
>> so there will be 750-1000 clients at least which sends data at the same
>> time.
>> Can 16+2 * 3 = 54 gluster nodes handle this or should I increase node
>> count?
>
>
> In this case I think it would be better to increase the number of bricks,
> otherwise you may have some performance hit to serve all these clients.
>
> One possibility is to get rid of the server RAID and use each disk as a
> single brick. This way you can create 26 bricks per server and assign each
> one to a different disperse set. A big distributed-dispersed volume balances
> I/O load between bricks better. Note that RAID configurations have a
> reduction in the available number of IOPS. For sequential writes, this is
> not so bad, but if you have many clients accessing the same bricks, you will
> see many random accesses even if clients are doing sequential writes.
> Caching can alleviate this, but if you want to sustain a throughput of 2-3
> GB/s, caching effects are not so evident.
>
> Without RAID you could use a 16+2 or even a 16+3 dispersed volume. This
> gives you a good protection and increased storage.
>
> Xavi
>
>>
>> I will check the parameters you mentioned.
>>
>> Serkan
>>
>> On Tue, Oct 13, 2015 at 1:43 PM, Xavier Hernandez <xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>> wrote:
>>
>> +gluster-users
>>
>>
>> On 13/10/15 12:34, Xavier Hernandez wrote:
>>
>> Hi Serkan,
>>
>> On 12/10/15 16:52, Serkan Çoban wrote:
>>
>> Hi,
>>
>> I am planning to use GlusterFS for backup purposes. I write
>> big files
>> (>100MB) with a throughput of 2-3GB/sn. In order to gain
>> from space we
>> plan to use erasure coding. I have some questions for EC and
>> brick
>> planning:
>> - I am planning to use 200TB XFS/ZFS RAID6 volume to hold
>> one brick per
>> server. Should I increase brick count? is increasing brick
>> count also
>> increases performance?
>>
>>
>> Using a distributed-dispersed volume increases performance. You
>> can
>> split each RAID6 volume into multiple bricks to create such a
>> volume.
>> This is because a single brick process cannot achieve the maximum
>> throughput of the disk, so creating multiple bricks improves this.
>> However having too many bricks could be worse because all
>> request will
>> go to the same filesystem and will compete between them in your
>> case.
>>
>> Another thing to consider is the size of the RAID volume. A
>> 200TB RAID
>> will require *a lot* of time to reconstruct in case of failure
>> of any
>> disk. Also, a 200 TB RAID means you need almost 30 8TB disks. A
>> RAID6 of
>> 30 disks is quite fragile. Maybe it would be better to create
>> multiple
>> RAID6 volumes, each with 18 disks at most (16+2 is a good and
>> efficient
>> configuration, specially for XFS on non-hardware raids). Even in
>> this
>> configuration, you can create multiple bricks in each RAID6
>> volume.
>>
>> - I plan to use 16+2 for EC. Is this a problem? Should I
>> decrease this
>> to 12+2 or 10+2? Or is it completely safe to use whatever we
>> want?
>>
>>
>> 16+2 is a very big configuration. It requires much computation
>> power and
>> forces you to grow (if you need to grow the gluster volume at some
>> point) in multiples of 18 bricks.
>>
>> Considering that you are already using a RAID6 in your servers,
>> what you
>> are really protecting with the disperse redundancy is the
>> failure of the
>> servers themselves. Maybe a 8+1 configuration could be enough
>> for your
>> needs and requires less computation. If you really need
>> redundancy 2,
>> 8+2 should be ok.
>>
>> Using values that are not a power of 2 has a theoretical impact
>> on the
>> performance of the disperse volume when applications write
>> blocks whose
>> size is a multiple of a power of 2 (which is the most normal
>> case). This
>> means that it's possible that a 10+2 performs worse than a 8+2.
>> However
>> this depends on many other factors, some even internal to
>> gluster, like
>> caching, meaning that the real impact could be almost negligible
>> in some
>> cases. You should test it with your workload.
>>
>> - I understand that EC calculation is performed on client
>> side, I want
>> to know if there are any benchmarks how EC affects CPU
>> usage? For
>> example each 100MB/sn traffic may use 1CPU core?
>>
>>
>> I don't have a detailed measurement of CPU usage related to
>> bandwidth,
>> however we have made some tests that seem to indicate that the CPU
>> overhead caused by disperse is quite small for a 4+2
>> configuration. I
>> don't have access to this data right now. When I have it, I'll
>> send it
>> to you.
>>
>> I will also try to do some tests with a 8+2 and 16+2
>> configuration to
>> see the difference.
>>
>> - Is client number affect cluster performance? Is there any
>> difference
>> if I connect 100 clients each writing with 20-30MB/s to
>> cluster vs 1000
>> clients each writing 2-3MB/s?
>>
>>
>> Increasing the number of clients improves performance however I
>> wont' go
>> over 100 clients as this could have a negative impact on
>> performance
>> caused by the overhead of managing all of them. In our tests, the
>> maximum performance if obtained with ~8 parallel clients (if my
>> memory
>> doesn't fail).
>>
>> You will also probably want to tweak some volume parameters, like
>> server.event-threads, client.event-threads,
>> performance.client-io-threads and server.outstanding-rpc-limit to
>> increase performance.
>>
>> Xavi
>>
>>
>> Thank you for your time,
>> Serkan
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
More information about the Gluster-users
mailing list