[Gluster-users] GlusterFS 3.3 not yet quiet ready for Virtual Machines storage

Tue Jun 5 20:52:47 UTC 2012

On Tue, Jun 05, 2012 at 04:41:20PM +0000, Fernando Frediani (Qube) wrote:
> Well Brian, first of all I think it's a bit of a waste to make a RAID10 on
> a Gluster enviroment given that the data is already replicated across
> other nodes.  It would limit from begining the usable space to 1/4 of the
> Raw which is quiet a lot.  Consider not only the price of the Disk, but
> alto to maintain it and the extra power each consumes, besides extra CPU
> and Memory.
> I would much rather have multiple nodes made of either RAID 5 or 6 and
> spread the IOPS across them as if each brick was a disk in a large RAID10
> enviroment (that's what was my point about spread IOPS acrross the whole
> cluster).  Yes some VMs could mount things directly from the Gluster
> volume(and using Gluster client if a Linux machine which is even better),
> but that is not always an option specially in if you are a Service
> Provider and you can not give that access to customers, so they must store
> they data on the local vdisks of their VMs

Well, of course you should build what suits your needs best, and you should
measure it to ensure you get the performance you need for your particular
application and usage profile.

The starting point of this thread was your claim that a glusterfs
striped+distributed+replicated configuration was "essential" for a
successful VM platform.

However, if you are talking about RAID5 or RAID6, then clearly performance
is not as important to you as you make out.

A single block write to a RAID6 array will require six disk transactions
(three reads followed by three writes).  A single block write to a RAID1 or
RAID10 will only require two transactions (two writes, which can be done
concurrently).  Furthermore, if the RAID6 array is operating in degraded
mode - as it certainly will from time to time - then you will get
substantially worse performance than this, as even reading a single block
may require reads to all the other disks in the array.  Performance is also
hammered by RAID rebuild activity.

This is all fundamental, and sticking glusterfs striping on top of this will
not improve it.  The performance you get with RAID5/RAID6 may be good enough
for your purposes, but I don't think you can argue sucessfully that
glusterfs striping on top is necessary.

If you need better performance than RAID5/6, but cannot take the data
duplication hit of RAID10 on every storage brick, then there are some other
non-gluster solutions you can look at:

* drbd is very solid, and you could consider as a replacement for RAID1.
  Ganeti will automate the hard work of building a VM cluster with
  separate drbd instances per VM.
* sheepdog will distribute your data across any of N nodes transparently,
  although is specific to KVM and I would argue is less battle-tested
  than drbd.

Finally, what some people have tried is to make each drive a separate brick,
with no RAID, and rely entirely on glusterfs replication.  While this seems
attractive, I think care is required; standard RAID arrays have
battle-tested procedures for handling disk failures, and you will need to be
100% confident that you can both monitor your array and implement the disk
swap/rebuild procedures entirely using glusterfs and its tools.  Also, it
can be cumbersome if you have a glusterfs volume made out of tens or
hundreds of bricks.

Regards,

Brian.