[Gluster-users] GlusteFS VS DRBD + power failure ?

Tue Oct 28 20:04:29 UTC 2014

On 10/27/2014 01:34 PM, Tytus Rogalewski wrote:
> Hi guys,
> I wanted to ask you about what happen in case of power failure.
> I have 2 node proxmox cluster with glusterfs as sdb1 XFS, and mounted 
> on each node as localhost/glusterstorage.
> I am storing VMs on it as qcow2(and inside ext4 filesystem).
>  Live migration works ok WOW.. Everything works fine.
> But tell me will something bad happen when the power will fail on 
> whole datacenter ?
> Will be data corrupted and will be the same thing if i am using drbd ?
> DRBD doesnt give me so much flexability(because i cant use qcow2 and 
> store files like iso or backups on drbd), but glusterfs does give me 
> much flexability !
> Anyway yesterday i created glusterfs with ext4, and VM qcow with ext4 
> on it and when i made "reboot -f"(i assume this is the same as i will 
> pull power cord off ?) - after node went online again, VM data was 
> corrupted and i had many ext failures inside VM.
> Tell me was that because i used ext4 on top of sdb1 glusterfs storage 
> or will that work the same with XFS ?
> Is drbd better  protection in case of power failure ?
My experience with DRBD is really old, but I became a gluster user 
because of my experience with drbd. After it destroyed my filesystem for 
the 3rd time, it was "replace that or find somewhere else to work" time.

I chose gluster because you can create a fullly redundant system from 
the client to each replica server, all the way through all the hardware 
by creating parallel network paths.

What you experienced is a result of the ping timeout. Ping-timeouts 
happen when the TCP connection is not closed, like when you pull the 
plug. The timeout exists to allow the filesystem to recover gracefully 
in the event of a temporary network problem. Without that, there's an 
increased load on the server while all the file descriptors are 
re-established. This can be a fairly heavy load, to the point where tcp 
pings are delayed. If they're delayed longer than ping-timeout, you have 
a race condition from which you'll never recover. For that reason, the 
ping-timeout is longer. You *can* adjust that timeout as long as you 
sufficiently test around the actual loads you're expecting.

Keep in mind your SLA/OLA expectations and engineer for them using the 
actual mathematical calculations, not just some gut expectations. Your 
DC power should be more reliable than most industries requirements.

>
> Anyway second question, if i have 2 nodes with glusterfs.
> node1 is changing file1.txt
> node2 is changing file2.txt
> then i will disconnect glusterfs in network, and data keeps changing 
> on both nodes)
> After i will reconnect glusterfs how this will go?
> Newer changed file1 from node1 will overwrite file1 on node2?
> and newer file2 changed on node2 will overwrite file2 on node1 ?
> Am i correct ?
>
> Thx for answer :)
>
Each client intends to write to both (all) replicas. The intent count is 
incremented in extended attributes, the write executes on a replica, the 
intent count is decremented for that replica. With the disconnect, each 
of those files will show pending changed destined for the other replica. 
When they are reconnected, the self-heal daemon (or a client attempting 
to access those files) will note the changes destined for the other 
brick and repair it.

Split-brain occurs when each side of that netsplit writes to the same 
file. That file indicates pending changes for the other brick. When the 
connection returns, they compare those pending flags and see changes to 
each that are unwritten on the other. They refuse and leave each file 
intact, forcing manual intervention to clear the split-brain.

You can avoid split-brain by using replica 3 and volume-level quorum, or 
with replica 2 and some 3rd observer, server quorum. It is also possible 
to have quorum with only 2 servers or replicas, but I wouldn't recommend 
it. With volume based quorum, the volume will go read only if the client 
loses connection with either server. With server quorum and only two 
servers, the server will shut down if it loses quorum completely 
removing access to the volume.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141028/a27dd415/attachment.html>