[Gluster-users] RAID 0 with Cache v/s NO-RAID

Thu Sep 27 20:28:33 UTC 2012

On 09/27/2012 12:31 PM, Brian Candler wrote:
> On Thu, Sep 27, 2012 at 10:08:12PM +0530, Indivar Nair wrote:
>>     We were trying to define our storage spec for Gluster and was wondering
>>     which would be better purely from a performance perspective.
>>     1. Use a simple 24 Disk JBOD with SAS Controller and export each hard
>>     disk as an individual volume
>>     OR
>>     2. Use the 24 Disk JBOD with Flash Based Cache enabled RAID Controller,
>>         create 12 RAID 0 Arrays of 2 Disks each,
>>         and take advantage of the caching, especially for writing.
> I'm not sure why your controller would do caching for pairs of disks in
> RAID0, but not for single disks??
>
>>     Just FYI, we will be creating a 'Striped Replicated' Volume for H/A.
> Where each server has a bunch of RAID0 disk sets? IMO this is a really,
> really bad idea.
>
> Consider the following:
>
> A. One disk in your RAID0 fails entirely. The whole volume is toast. You
> insert a new disk, do mkfs, and then you have to sync the whole filesystem's
> worth of data from the other server.  You hope that a disk doesn't fail in
> the corresponding volume on the other server during this period.
>
> But it's worse than this. Consider:
>
> B. You have a single unrecoverable read error on a single sector.
>
> In a RAID1 or RAID5 or RAID6, the controller will be able to recover the
> data from a different disk, write the data back to the failed disk, which
> will remap the bad sector to another part of the disk, and everything will
> continue fine just as if nothing happened. (Side note: you need to have
> drives which support ERC/TLER for this to work)
>
> With a RAID0, your entire brick will go down; Gluster cannot do this sort of
> sector-level repair.  You are then back in the situation (A) above, except
> that you will end up needlessly replacing a drive.
>
> Or you can dd the affected drive with zeros to force any bad sectors to be
> remapped; this will take hours, meanwhile you cross your fingers that you
> don't have any read error from the RAID0 on your other server.
>
> This is not a good recipe for data safety. If you care about capacity over
> speed, then use RAID6 in your bricks.  If you care about speed over
> capacity, then use RAID10.
>
> Of course, if you are just using this for scratch space (lots of temporary
> files) then RAID0 is probably fine - but your talk of HA suggests that your
> data is more important than that.
If you're interested, here's some more about using stripe: 
http://joejulian.name/blog/should-i-use-stripe-on-glusterfs/