[Gluster-devel] Splitbrain Resolution

Thu Apr 17 23:53:25 UTC 2008

Hi,

I'm experimenting with GlusterFS and I have a few questions that the 
documentation seems to leave unanswered.

1) Replication (AFR)
How does this work? I can see from my test setup that the mounted FS has 
the same content (files) as the backing directory on the server. The 
server configuration only lists the local node and not the other peer 
servers. This implies that all replication/syncing is done by the 
clients. This in turn implies that the read load can be shared between 
the servers (scales to 1*n where n is the number of servers), but the 
write load gets sent to each server. This implies that the write 
performance scales inversely when using mirroring (1/n). This seems 
quite poor. Am I misunderstanding how this works? Do the servers 
replicate between themselves? Or does all replication really happen on 
the client nodes? How would this handle the condition of writes 
happening on the server directly to the backing directory while the 
client is trying to write to the same directory/files? Would this work 
the same as NFS would or is there a definitive requirement to always 
access the data via the glusterfs mount point? (I understand that this 
is only possible when using AFR, and not with striping.)

2) Splitbrain
How does the recovery from this situation get handled? Which file wins, 
and which file gets clobbered? Is there any scope for conflict 
resolution (e.g. as in Coda)?

3) Metadata Storage
When using striping, how does the file data get split, and how/where is 
the metadata kept?

4) Fencing and Quorum
Is there error/desync detection, and are there such concepts as fencing 
of dead server nodes and quorum to prevent splitbrain operation?

5) Metadata Change Detection
I understand from the documentation tha replication/sync (where 
required) happens when opening a file for reading. What about metadata? 
Are metadata changes (e.g. file size, modification time) visible to the 
clients when the file changes on the server and the client issues an ls? 
Or is it necessary to read the file before issuing ls to get the 
size/timestamp metadata? What about touching a file? Does this cause the 
file to be synced? Would it cause the file to get corrupted if another 
node updated the file's content it in the meantime?

Thanks.

Gordan