[Gluster-users] RAID options for Gluster
landman at scalableinformatics.com
Thu Jun 14 14:09:18 UTC 2012
On 06/14/2012 07:06 AM, Fernando Frediani (Qube) wrote:
> I think this discussion probably came up here already but I couldn’t
> find much on the archives. Would you able to comment or correct whatever
> might look wrong.
> What options people think is more adequate to use with Gluster in terms
> of RAID underneath and a good balance between cost, usable space and
> performance. I have thought about two main options with its Pros and Cons
> *No RAID (individual hot swappable disks):*
> Each disk is a brick individually (server:/disk1, server:/disk2, etc) so
> no RAID controller is required. As the data is replicated if one fail
> the data must exist in another disk on another node.
For this to work well, you need the ability to mark a disk as failed and
as ready for removal, or to migrate all data on a disk over to a new
disk. Gluster only has the last capability, and doesn't have the rest.
You still need additional support in the OS and tool sets.
The tools we've developed for DeltaV and siFlash help in this regard,
though I wouldn't suggest using Gluster in this mode.
> Cheaper to build as there is no cost for a expensive RAID controller.
If a $500USD RAID adapter saves you $1000USD of time/expense over its
lifetime due to failed disk alerts, hot swap autoconfiguration, etc. is
it "really" expensive? Of course, if you are at a university where you
have infinite amounts of cheap labor, sure, its expensive. Cheaper to
manage by throwing grad/undergrad students at it than it is to manage
with an HBA.
That is, the word "expensive" has different meanings in different
contexts ... and in storage, the $500USD adapter may easily help reduce
costs elsewhere in the system (usually in the disk lifecycle management,
as RAID's major purpose in life is to give you the administrator a
fighting chance to replace a failed device before you lose your data).
> Improved performance as writes have to be done only on a single disk not
> in the entire RAID5/6 Array.
Good for tiny writes. Bad for larger writes (>64kB)
> Make better usage of the Raw space as there is no disk for parity on a
> RAID 5/6
> If a failed disk gets replaced the data need to be replicated over the
> network (not a big deal if using Infiniband or 1Gbps+ Network)
For a 100 MB/s pipe (streaming disk read, which you don't normally get
when you copy random files to/from disk), 1 GB = 10 seconds. 1 TB =
10,000 seconds. This is the best case scenario. In reality, you will
get some fractional portion of that disk read/write speed. So expect
10,000 seconds as the most optimistic (and unrealistic) estimate ... a
lower bound on time.
> The biggest file size is the size of one disk if using a volume type
For some users this is not a problem, though several years ago, we had
users wanting to read write *single* TB sized files.
> In this case does anyone know if when replacing a failed disk does it
> need to be manually formatted and mounted ?
In this model, yes. This is why the RAID adapter saves time unless you
have written/purchased "expensive" tools to do similar things.
> *RAID Controller:*
> Using a RAID controller with battery backup can improve the performance
> specially caching the writes on the controller’s memory but at the end
> one single array means the equivalent performance of one disk for each
> brick. Also RAID requires have either 1 or 2 disks for parity. If using
For large reads/writes, you typically get N* (N disks reduced by number
of parity disks and hot spares) disk performance. For small
reads/writes you get 1 disk (or less) performance. Basically optimal
read/write will be in multiples of the stripe width. Optimizing stripe
width and chunk sizes for various applications is something of a black
art, in that overoptimization for one size/app will negatively impact
> very cheap disks probably better use RAID 6, if using better quality
> ones should be fine RAID 5 as, again, the data the data is replicated to
> another RAID 5 on another node.
If you have more than 6TB of data, use RAID6 or RAID10. RAID5 shouldn't
be used for TB class storage for units with UCE rates more than 10^-17
(you would hit a UCE on rebuild for a failed drive, which would take out
all your data ... not nice).
> Can create larger array as a single brick in order to fit bigger files
> for when using Distributed volume type.
> Disk rebuild should be quicker (and more automated?)
More generally, management is nearly automatic, modulo physically
replacing a drive.
> Extra cost of the RAID controller.
Its a cost-benefit analysis, and for lower end storage units, the CBE
almost always is in favor of a reasonable RAID design.
> Performance of the array is equivalent a single disk + RAID controller
> caching features.
No ... see above.
> RAID doesn’t scale well beyond ~16 disks
16 disks is the absolute maximum we would ever tie to a single RAID (or
HBA). Most RAID processor chips can't handle the calculations for 16
disks (compare the performance of RAID6 at 16 drives to that at 12
drives for similar sized chunks, and "optimal" IO ... in most cases, the
performance delta isn't 16/12, 14/10, 13/9 or similar. Its typically a
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Gluster-users