[Gluster-users] GlusterFS as virtual machine storage

Fri Aug 25 21:08:14 UTC 2017

Il 25-08-2017 21:48 WK ha scritto:
> On 8/25/2017 12:56 AM, Gionatan Danti wrote:
> 
> 
> We ran Rep2 for years on 3.4.  It does work if you are really,really 
> careful,  But in a crash on one side, you might have lost some bits
> that were on the fly. The VM would then try to heal.
> Without sharding, big VMs take a while because the WHOLE VM file has
> to be copied over. Then you might get Split-brain and have to stop the
> VM, pick the good one, make sure that is healed on both sides and then
> restart the VM.

Ok, so sharding needs to be enabled for VM disk storage, otherwise heal 
time skyrockets.

> Arbiter/Replica 3 prevents that. Sharding helps a lot as well by
> making the heals really quick, though in a Replica 2 with sharding you
> no longer have a nice big  .img file sitting on each brick in plain
> view and picking a split-brain winner is now WAY more complicated. You
> would have to re-assemble things.

This concern me, and it is the reason I would like to avoid sharding. 
How can I recover from such a situation? How can I "decide" which 
(reconstructed) file is the one to keep rather than to delete?

> 
> We were quite good and fixing broken Gluster 3.4 nodes, but we are
> *much* happier with the Arbiter node and sharding. It is a huge
> difference.
> We could go to Rep3 but we like the extra speed and we are comfortable
> with the Arb limitations (we also have excellent off cluster backups
> <grin>).
> 
> 
>> Also, on a two-node setup it is *guaranteed* for updates to one node 
>> to put offline the whole volume?
> 
> If you still have quorum turned on, then yes. One side goes and you are 
> down.
> 
>> On the other hand, a 3-way setup (or 2+arbiter) if free from all these 
>> problems?
>> 
> 
> Yes, you can lose one of the three nodes and after the pause,
> everything just continues. If you have a second failure before you can
> recover, then you have lost quorum.
> 
> If that second failure is the other actual replica, then you could get
> into a situation where the arbiter isn't happy with either copy when
> you come back up and of course the arbiter doesn't have a good copy
> itself. Pavel alluded to something like that when describing his
> problem.
> 
> That is where replica 3 helps. In theory, with replica 3, you could
> lose 2 nodes and still have a reasonable copy of your VM, though
> you've lost quorum and are still down. At that point, *I* would kill
> the two bad nodes (STOMITH) to prevent them from coming back AND turn
> off quorum. You could then run on the single node until you can
> save/copy those VM images, preferably by migrating off that volume
> completely. Create a remote pool using SSHFS if you have nothing else
> available. THEN I would go back and fix the gluster cluster and
> migrate back into it.
> 
> Replica2/Replica3 does not matter if you lose your Gluster network
> switch, but again the Arb or Rep3 setup makes it easier to recover. I
> suppose the only advantage of Replica2 is that you can use a cross
> over cable and not worry about losing the switch, but bonding/teaming
> works well and there are bonding modes that don't require the same
> switch for the bond slaves. So you can build in some redundancy there
> as well.

Thank you for the very valuable informations.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti at assyoma.it - info at assyoma.it
GPG public key ID: FF5F32A8