[Gluster-users] GlusterFS as virtual machine storage

Fri Aug 25 19:48:31 UTC 2017

On 8/25/2017 12:56 AM, Gionatan Danti wrote:
>
>
>> WK wrote:
>> 2 node plus Arbiter. You NEED the arbiter or a third node. Do NOT try 2
>> node with a VM
>
> This is true even if I manage locking at application level (via 
> virlock or sanlock)?

We ran Rep2 for years on 3.4.  It does work if you are really,really  
careful,  But in a crash on one side, you might have lost some bits that 
were on the fly. The VM would then try to heal.
Without sharding, big VMs take a while because the WHOLE VM file has to 
be copied over. Then you might get Split-brain and have to stop the VM, 
pick the good one, make sure that is healed on both sides and then 
restart the VM.

Arbiter/Replica 3 prevents that. Sharding helps a lot as well by making 
the heals really quick, though in a Replica 2 with sharding you no 
longer have a nice big  .img file sitting on each brick in plain view 
and picking a split-brain winner is now WAY more complicated. You would 
have to re-assemble things.

We were quite good and fixing broken Gluster 3.4 nodes, but we are 
*much* happier with the Arbiter node and sharding. It is a huge difference.
We could go to Rep3 but we like the extra speed and we are comfortable 
with the Arb limitations (we also have excellent off cluster backups 
<grin>).

> Also, on a two-node setup it is *guaranteed* for updates to one node 
> to put offline the whole volume?

If you still have quorum turned on, then yes. One side goes and you are 
down.

> On the other hand, a 3-way setup (or 2+arbiter) if free from all these 
> problems?
>

Yes, you can lose one of the three nodes and after the pause, everything 
just continues. If you have a second failure before you can recover, 
then you have lost quorum.

If that second failure is the other actual replica, then you could get 
into a situation where the arbiter isn't happy with either copy when you 
come back up and of course the arbiter doesn't have a good copy itself. 
Pavel alluded to something like that when describing his problem.

That is where replica 3 helps. In theory, with replica 3, you could lose 
2 nodes and still have a reasonable copy of your VM, though you've lost 
quorum and are still down. At that point, *I* would kill the two bad 
nodes (STOMITH) to prevent them from coming back AND turn off quorum. 
You could then run on the single node until you can save/copy those VM 
images, preferably by migrating off that volume completely. Create a 
remote pool using SSHFS if you have nothing else available. THEN I would 
go back and fix the gluster cluster and migrate back into it.

Replica2/Replica3 does not matter if you lose your Gluster network 
switch, but again the Arb or Rep3 setup makes it easier to recover. I 
suppose the only advantage of Replica2 is that you can use a cross over 
cable and not worry about losing the switch, but bonding/teaming works 
well and there are bonding modes that don't require the same switch for 
the bond slaves. So you can build in some redundancy there as well.