[Gluster-users] Shared VM disk/image on gluster for redundancy?

Tue Jun 29 15:29:54 UTC 2010

On 06/29/2010 10:04 AM, Emmanuel Noobadmin wrote:
> So most likely I would run two or more physical machines with VM to
> failover to each other to catch situations of a single machine
> failure. Along with that a pair of storage server. In the case of a
> total failure where both the primary & secondary VM dies physically,
> roll in a new machine to load up the VM images still safe on the
> gluster data servers.
> 
> So in this case would I be correct that my configuration, assuming a
> basic 2 physical VM host server and 2 storage server would probably
> look something like
> 
> volume rep0
> 	type cluster/replicate
> 	option read-subvolume vmsrv0vol0
> 	subvolumes vmsrv0vol0 datasrv0vol0 datasrv1vol0
> end-volume
> 
> 
> volume rep1
> 	type cluster/replicate
> 	option read-subvolume vmsrv1vol0
> 	subvolumes vmsrv1vol0 datasrv0vol0 datasrv1vol0
> end-volume
> 
> volume my_nufa
> 	type cluster/nufa
> 	option local-volume-name rep0
> 	subvolumes rep0 rep1
> end-volume
> 
> Or did I lose my way somewhere? :)

That looks reasonable to me, except that the last stanza would only
apply on vmsrv0.  For vmsrv1, you'd want this instead:

	volume my_nufa
		type cluster/nufa
		option local-volume-name rep1	# this is the only difference
		subvolumes rep0 rep1
	end-volume

It's a little unfortunate that you can't do this with a single volfile,
perhaps with $-variable substitutions or some such, but that's the way
it is AFAIK.

> Does it make any sense to replicate across all 3 or should I simply
> spec the VM servers with tiny drives and put everything on the gluster
> storage which I suppose would impact performance severely?

That's a pretty murky area.  With a fast interconnect it's tempting to
say the "storage of record" should be only on the data nodes and the app
nodes should only do caching.  It would certainly be simpler, though
with more than two data nodes you'd have to do essentially the same
layering of distribute on top of replicate (nufa wouldn't be
particularly useful in that configuration).  If you wanted to stick with
something more like the above, you'd just need to pair each app node
with a data node, so e.g. rep0=vmsrv0vol0+datasrv0vol0 and
rep1=vmsrv1vol0+datasrv1vol1.  You would probably also want to "cross"
the read-subvolume assignments, so for example vol0 would go first to
datasrv1vol0 instead of vmsrv1vol0 for rep1.  This avoids having the app
nodes talk to each other when they could be talking to data nodes instead.