[Gluster-devel] solutions for split brain situation

Gordan Bobic gordan at bobich.net
Thu Sep 17 00:33:52 UTC 2009

Mark Mielke wrote:

> In case it is of any use to other, here is the list I had worked out 
> before when doing my analysis:


Since the issue of alternatives has been raised - you missed two file 
systems in your summary:

1) SeznamFS. No POSIX locking and concurrent writes are wrought with 
race conditions (by very design), but it is quite useful for some 
use-cases, and it is stable. Fuse based. The file system is replicated 
using the same paradigm as MySQL's replication (serialized write stream 

2) PeerFS. POSIX, commercial, relatively expensive, replicated, block 
level based, native kernel driver. The thing that killed it for me is 
that there is no way to resize the file system without doing a full 
dump/restore of the data - which is prohibitive with multi-TB data 
stores replicated over a WAN.

As for Coda - you say there is no further development being done on it, 
but that is because it is completed and stable. I _almost_ ended up 
using it, but there were a few things that finally pushed me over toward 
a hybrid SeznamFS/GlusterFS solution.
1) No POSIX locking, user/group based permissions.
2) Cannot sensibly be used as a home directory because it has to be 
mounted by the user after logging in due to it's externally bolted on 
security system (in it's defence, it is a _global_ file system by 
design, so POSIX doesn't really fit with it's security paradigm).
3) The metadata is limited to something like 1MB/directory. This 
includes all file names the directory contains, so is unsuitable for 
Maildirs or large source code directories.
4) The files are kept as files with the same content, but not with the 
same names (file names in the Coda's backing store are just numbers), as 
SeznamFS and GlusterFS conveniently do. This makes data recovery more 
difficult in case things go wrong compared to SeznamFS and GlusterFS.


