[Gluster-users] Quorum and reboots

Thu Mar 10 22:52:16 UTC 2016

On 11/03/2016 2:24 AM, David Gossage wrote:
>   It is file based not block based healing so it saw multi-GB files 
> that it had to recopy over.  It had to halt all write to those files 
> while that occurred or it would be a never ending cycle of re-copying 
> the large images.  So the fact most VM's went haywire isnt that odd.  
> It does look based on timing in alerts the 2 bricks that were up kept 
> serving images until 3rd brick came back.  It did heal all images just 
> fine.
>

What version are you running?  3.7.x has sharding (breaks large files 
into chunks) to allow much finer grained healing, it speeds up heals a 
*lot*. However it can't be applied retroactively, you have to enable 
sharding then copy the VM over :(

http://blog.gluster.org/2015/12/introducing-shard-translator/

In regards to rolling reboots, it can be done with replicated storage 
and gluster will transparently  hand over client read/writes, but for 
each VM image, only one copy at a time can be healing over wise access 
will be blocked as you saw.

So recommended procedure:
- Enable sharding
- copy VM's over
- when rebooting wait for heals to complete before rebooting the next node

nb: Thoroughly recommend 3 way replication as you have done, it saves a 
lot of headaches with quorums and split brain.

-- 
Lindsay Mathieson