[Gluster-users] EC planning

Wed Oct 14 13:13:38 UTC 2015

Hi Xavier,

>I'm not sure if I understand you. Are you saying you will create two separate gluster volumes or you will add both bricks to the same distributed-dispersed volume ?

Is adding more than one brick from same host to a disperse gluster
volume recommended? I meant two different gluster volume.
If I add two bricks from same server to same dispersed volume and lets
say it is 8+1 configuration, then loosing one host will bring down the
volume right?

>One possibility is to get rid of the server RAID and use each disk as a single brick. This way you can create 26 bricks per server and assign each one to a different disperse set. A big distributed-dispersed volume balances I/O load >between bricks better. Note that RAID configurations have a reduction in the available number of IOPS. For sequential writes, this is not so bad, but if you have many clients accessing the same bricks, you will see many random ?>accesses even if clients are doing sequential writes. Caching can alleviate this, but if you want to sustain a throughput of 2-3 GB/s, caching effects are not so evident.

I can create 26 JBOD disks and use them as bricks but is this
recommended? By using 50 servers, brick count will be 1300, is this
not a problem?
Can you explain the configuration a bit more? For example by using
16+2, 26 brick per server and 54 servers total. In the end I only want
one gluster volume and protection for 2 host failure.
Also in this case disk failures will be handled by gluster I hope this
don't bring more problems. But I will also test this configuration
when I get the servers..

Serkan

On Wed, Oct 14, 2015 at 2:03 PM, Xavier Hernandez <xhernandez at datalab.es> wrote:
> Hi Serkan,
>
> On 13/10/15 15:53, Serkan Çoban wrote:
>>
>> Hi Xavier and thanks for your answers.
>>
>> Servers will have 26*8TB disks.I don't want to loose more than 2 disk
>> for raid,
>> so my options are HW RAID6 24+2 or 2 * HW RAID5 12+1,
>
>
> A RAID5 of more than 8-10 disks is normally considered unsafe because the
> probability of a second drive failure while reconstructing another failed
> drive is considerably high. The same happens with a RAID6 of more than 16-20
> disks.
>
>> in both cases I can create 2 bricks per server using LVM and use one brick
>> per server to create two distributed-disperse volumes. I will test those
>> configurations when servers arrive.
>
>
> I'm not sure if I understand you. Are you saying you will create two
> separate gluster volumes or you will add both bricks to the same
> distributed-dispersed volume ?
>
>>
>> I can go with 8+1 or 16+2, will make tests when servers arrive. But 8+2
>> will
>> be too much, I lost nearly %25 space in this case.
>>
>> For the client count, this cluster will get backups from hadoop nodes
>> so there will be 750-1000 clients at least which sends data at the same
>> time.
>> Can 16+2 * 3 = 54 gluster nodes handle this or should I increase node
>> count?
>
>
> In this case I think it would be better to increase the number of bricks,
> otherwise you may have some performance hit to serve all these clients.
>
> One possibility is to get rid of the server RAID and use each disk as a
> single brick. This way you can create 26 bricks per server and assign each
> one to a different disperse set. A big distributed-dispersed volume balances
> I/O load between bricks better. Note that RAID configurations have a
> reduction in the available number of IOPS. For sequential writes, this is
> not so bad, but if you have many clients accessing the same bricks, you will
> see many random accesses even if clients are doing sequential writes.
> Caching can alleviate this, but if you want to sustain a throughput of 2-3
> GB/s, caching effects are not so evident.
>
> Without RAID you could use a 16+2 or even a 16+3 dispersed volume. This
> gives you a good protection and increased storage.
>
> Xavi
>
>>
>> I will check the parameters you mentioned.
>>
>> Serkan
>>
>> On Tue, Oct 13, 2015 at 1:43 PM, Xavier Hernandez <xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>> wrote:
>>
>>     +gluster-users
>>
>>
>>     On 13/10/15 12:34, Xavier Hernandez wrote:
>>
>>         Hi Serkan,
>>
>>         On 12/10/15 16:52, Serkan Çoban wrote:
>>
>>             Hi,
>>
>>             I am planning to use GlusterFS for backup purposes. I write
>>             big files
>>             (>100MB) with a throughput of 2-3GB/sn. In order to gain
>>             from space we
>>             plan to use erasure coding. I have some questions for EC and
>>             brick
>>             planning:
>>             - I am planning to use 200TB XFS/ZFS RAID6 volume to hold
>>             one brick per
>>             server. Should I increase brick count? is increasing brick
>>             count also
>>             increases performance?
>>
>>
>>         Using a distributed-dispersed volume increases performance. You
>> can
>>         split each RAID6 volume into multiple bricks to create such a
>>         volume.
>>         This is because a single brick process cannot achieve the maximum
>>         throughput of the disk, so creating multiple bricks improves this.
>>         However having too many bricks could be worse because all
>>         request will
>>         go to the same filesystem and will compete between them in your
>>         case.
>>
>>         Another thing to consider is the size of the RAID volume. A
>>         200TB RAID
>>         will require *a lot* of time to reconstruct in case of failure
>>         of any
>>         disk. Also, a 200 TB RAID means you need almost 30 8TB disks. A
>>         RAID6 of
>>         30 disks is quite fragile. Maybe it would be better to create
>>         multiple
>>         RAID6 volumes, each with 18 disks at most (16+2 is a good and
>>         efficient
>>         configuration, specially for XFS on non-hardware raids). Even in
>>         this
>>         configuration, you can create multiple bricks in each RAID6
>> volume.
>>
>>             - I plan to use 16+2 for EC. Is this a problem? Should I
>>             decrease this
>>             to 12+2 or 10+2? Or is it completely safe to use whatever we
>>             want?
>>
>>
>>         16+2 is a very big configuration. It requires much computation
>>         power and
>>         forces you to grow (if you need to grow the gluster volume at some
>>         point) in multiples of 18 bricks.
>>
>>         Considering that you are already using a RAID6 in your servers,
>>         what you
>>         are really protecting with the disperse redundancy is the
>>         failure of the
>>         servers themselves. Maybe a 8+1 configuration could be enough
>>         for your
>>         needs and requires less computation. If you really need
>>         redundancy 2,
>>         8+2 should be ok.
>>
>>         Using values that are not a power of 2 has a theoretical impact
>>         on the
>>         performance of the disperse volume when applications write
>>         blocks whose
>>         size is a multiple of a power of 2 (which is the most normal
>>         case). This
>>         means that it's possible that a 10+2 performs worse than a 8+2.
>>         However
>>         this depends on many other factors, some even internal to
>>         gluster, like
>>         caching, meaning that the real impact could be almost negligible
>>         in some
>>         cases. You should test it with your workload.
>>
>>             - I understand that EC calculation is performed on client
>>             side, I want
>>             to know if there are any benchmarks how EC affects CPU
>>             usage? For
>>             example each 100MB/sn traffic may use 1CPU core?
>>
>>
>>         I don't have a detailed measurement of CPU usage related to
>>         bandwidth,
>>         however we have made some tests that seem to indicate that the CPU
>>         overhead caused by disperse is quite small for a 4+2
>>         configuration. I
>>         don't have access to this data right now. When I have it, I'll
>>         send it
>>         to you.
>>
>>         I will also try to do some tests with a 8+2 and 16+2
>>         configuration to
>>         see the difference.
>>
>>             - Is client number affect cluster performance? Is there any
>>             difference
>>             if I connect 100 clients each writing with 20-30MB/s to
>>             cluster vs 1000
>>             clients each writing 2-3MB/s?
>>
>>
>>         Increasing the number of clients improves performance however I
>>         wont' go
>>         over 100 clients as this could have a negative impact on
>> performance
>>         caused by the overhead of managing all of them. In our tests, the
>>         maximum performance if obtained with ~8 parallel clients (if my
>>         memory
>>         doesn't fail).
>>
>>         You will also probably want to tweak some volume parameters, like
>>         server.event-threads, client.event-threads,
>>         performance.client-io-threads and server.outstanding-rpc-limit to
>>         increase performance.
>>
>>         Xavi
>>
>>
>>             Thank you for your time,
>>             Serkan
>>
>>
>>             _______________________________________________
>>             Gluster-users mailing list
>>             Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>             http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>