[Gluster-users] RAID options for Gluster

Fernando Frediani (Qube) fernando.frediani at qubenet.net
Thu Jun 14 11:06:32 UTC 2012


I think this discussion probably came up here already but I couldn't find much on the archives. Would you able to comment or correct whatever might look wrong.

What options people think is more adequate to use with Gluster in terms of RAID underneath and a good balance between cost, usable space and performance. I have thought about two main options with its Pros and Cons

No RAID (individual hot swappable disks):
Each disk is a brick individually (server:/disk1, server:/disk2, etc) so no RAID controller is required. As the data is replicated if one fail the data must exist in another disk on another node.
Pros:
Cheaper to build as there is no cost for a expensive RAID controller.
Improved performance as writes have to be done only on a single disk not in the entire RAID5/6 Array.
Make better usage of the Raw space as there is no disk for parity on a RAID 5/6

Cons:
If a failed disk gets replaced the data need to be replicated over the network (not a big deal if using Infiniband or 1Gbps+ Network)
The biggest file size is the size of one disk if using a volume type Distributed.

In this case does anyone know if when replacing a failed disk does it need to be manually formatted and mounted ?

RAID Controller:
Using a RAID controller with battery backup can improve the performance specially caching the writes on the controller's memory but at the end one single array means the equivalent performance of one disk for each brick. Also RAID requires have either 1 or 2 disks for parity. If using very cheap disks probably better use RAID 6, if using better quality ones should be fine RAID 5 as, again, the data the data is replicated to another RAID 5 on another node.
Pros:
Can create larger array as a single brick in order to fit bigger files for when using Distributed volume type.
Disk rebuild should be quicker (and more automated?)
Cons:
Extra cost of the RAID controller.
Performance of the array is equivalent a single disk + RAID controller caching features.
RAID doesn't scale well beyond ~16 disks

Attaching a JBOD to a node and creating multiple RAID Arrays(or a single server with more disk slots) instead of adding a new node can save power(no need CPU, Memory, Motherboard), but having multiple bricks on the same node might happen the data is replicated inside the same node making the downtime of a node something critical, or does Gluster is smart to replicate data to a brick in a different node ?

Regards,

Fernando
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120614/8bf8992e/attachment.html>


More information about the Gluster-users mailing list