[Gluster-users] GlusterFS as virtual machine storage
WK
wkmail at bneit.com
Wed Aug 23 18:50:40 UTC 2017
On 8/21/2017 1:09 PM, Gionatan Danti wrote:
>
> Hi all,
> I would like to ask if, and with how much success, you are using
> GlusterFS for virtual machine storage.
>
> My plan: I want to setup a 2-node cluster, where VM runs on the nodes
> themselves and can be live-migrated on demand.
>
> I have some questions:
> - do you use GlusterFS for similar setup?
2 node plus Arbiter. You NEED the arbiter or a third node. Do NOT try 2
node with a VM.
We also use sharding to speed up heals.
3 node would be even better but 2 node + Arbiter is faster.
Note the arbiter doesn't have to be great kit. Its mostly memory and a
small amount of Hard Disk space and even on older systems, you can throw
in a cheap low capacity SSD drive. You probably have a bunch of those
lying around.
You could probably get away with using containers for the arbiter and
'share' an arbiter "host" among clusters. We haven't had a chance to try
that yet, but with net=host and a unique IP per container, I don't see
why it would be an issue.
> - if so, how do you feel about it?
Very happy. Reasonable and reliable performance (compared to other
distributed storage). Gluster does not have the performance of a direct
attached SSD drive but none of the distributed storage options can do
that, unless they cheat with heavy buffering and async writes which is
problematic on VM files, if something bad happens.
> - if a node crashes/reboots, how the system re-syncs? Will the VM
> files be fully resynchronized, or the live node keeps some sort of
> write bitmap to resynchronize changed/written chunks only? (note: I
> know about sharding, but I would like to avoid it);
Without sharding any reheal after an outage (planned or otherwise) will
take a LOT longer (because you have to sync the entire VM file which in
our case is 20GB to 150GB per VM affected). That can take quite a while
even with a fast network.
With sharding in many cases the reheal after maintenance amounts to a
'pause' and is almost a non-event, because it only has to heal the few
shards that are out of sync.
The cool thing about gluster in old-school replication node, is the VM
files are all there on each node. There is no master index that can get
corrupted with your bits spread out among the various nodes.
Of course with sharding, you would have to re-assemble the file, but
that has been discussed on this list and we have tested that several
times on even large VMs, by removing a brick and having a tech
re-assemble and check the md5sum to make sure we have a working VM file.
>
> - finally, how much stable is the system?
We were on 3.4 for years for some old clusters, never had a serious
problem but we had to be really careful during upgrades/reboots because
they were 2 node systems and if you didn't do things precisely you ended
up with a split-brain. On the rare crash event, we would often pick a
good image from one of the nodes and designate it as the source.
We are on 3.10 now and the arbiter+sharding go a long way in solving
that issue.
-wk
More information about the Gluster-users
mailing list