[Gluster-users] RAID 0 with Cache v/s NO-RAID

Fri Sep 28 03:28:55 UTC 2012

Thanks Brian for your inputs.
Our requirement is both high throughput and high-availability.

Let me give a little bit of background for better understanding of our
requirement -
It will be used by animation artists and a render-farm with around 300
render nodes.
--------------------------------------------------------------------------------------------------------------------------------------------------------
1. When a rendering job is fired, we can expect at least 50 render nodes to
simultaneously hit the storage to read a single scene (information) file.
Now, this file could be anywhere in the range of 100MB to 2GB in size.

2.  Once the render is complete, each of this render node would write the
generated image file back to the storage. The image files would be of 10 -
50MB is size. Here again, we can expect most of the renders to finish
almost simultaneously, usually within a few seconds of each other.

3. The 100MB - 2GB scene will almost always be written to by a single
artist i.e. no 2 artists would be working on the same scene file
simultaneously.

4. The 10 - 50MB image files, from different rendering activities, would
then be read by another set of nodes, for something called 'compositing'.
Compositing, gives you the final 'shot' output.
--------------------------------------------------------------------------------------------------------------------------------------------------------

We were trying to cater to both large file (100MB - 2GB) read speed and
small file (10-50MB) read+write speed.
With Gluster, we were thinking of setting the individual stripe size to
50MB so that each volume could hold a complete small file. While larger
files could be striped across in 50MB chunks.

The RAID Controllers that come with branded hardware does not allow
individual disk access (no passthrough mode), And plain SAS Controllers
don't come with cache. So we were thinking of using a RAID Controller with
cache, and creating RAID 0 arrays using just 2 disks.

One more thought, is it possible to have a mix of RAID6 volumes, and
individual disks and force Gluster to write large files (*.ma) to RAID6
volumes and small files (*.iff) to individual disks. That would solve our
problem completely.

Regards,

Indivar Nair

On Fri, Sep 28, 2012 at 1:01 AM, Brian Candler <B.Candler at pobox.com> wrote:

> On Thu, Sep 27, 2012 at 10:08:12PM +0530, Indivar Nair wrote:
> >    We were trying to define our storage spec for Gluster and was
> wondering
> >    which would be better purely from a performance perspective.
> >    1. Use a simple 24 Disk JBOD with SAS Controller and export each hard
> >    disk as an individual volume
> >    OR
> >    2. Use the 24 Disk JBOD with Flash Based Cache enabled RAID
> Controller,
> >        create 12 RAID 0 Arrays of 2 Disks each,
> >        and take advantage of the caching, especially for writing.
>
> I'm not sure why your controller would do caching for pairs of disks in
> RAID0, but not for single disks??
>
> >    Just FYI, we will be creating a 'Striped Replicated' Volume for H/A.
>
> Where each server has a bunch of RAID0 disk sets? IMO this is a really,
> really bad idea.
>
> Consider the following:
>
> A. One disk in your RAID0 fails entirely. The whole volume is toast. You
> insert a new disk, do mkfs, and then you have to sync the whole
> filesystem's
> worth of data from the other server.  You hope that a disk doesn't fail in
> the corresponding volume on the other server during this period.
>
> But it's worse than this. Consider:
>
> B. You have a single unrecoverable read error on a single sector.
>
> In a RAID1 or RAID5 or RAID6, the controller will be able to recover the
> data from a different disk, write the data back to the failed disk, which
> will remap the bad sector to another part of the disk, and everything will
> continue fine just as if nothing happened. (Side note: you need to have
> drives which support ERC/TLER for this to work)
>
> With a RAID0, your entire brick will go down; Gluster cannot do this sort
> of
> sector-level repair.  You are then back in the situation (A) above, except
> that you will end up needlessly replacing a drive.
>
> Or you can dd the affected drive with zeros to force any bad sectors to be
> remapped; this will take hours, meanwhile you cross your fingers that you
> don't have any read error from the RAID0 on your other server.
>
> This is not a good recipe for data safety. If you care about capacity over
> speed, then use RAID6 in your bricks.  If you care about speed over
> capacity, then use RAID10.
>
> Of course, if you are just using this for scratch space (lots of temporary
> files) then RAID0 is probably fine - but your talk of HA suggests that your
> data is more important than that.
>
> Regards,
>
> Brian.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120928/aa4b31c5/attachment.html>