[Gluster-devel] Gluster Recovery

Anand Avati avati at zresearch.com
Thu Apr 26 10:33:35 UTC 2007

> Let's take the same configuration (we assume that self heal translator
> is on) with this scenario :
> Client1 is writing a huge file on the afr volume (like 200Mb file size)
> Let's say that during this writing (that will take some time), Server 2
> goes down, and then comes back a little after but *before* the wrting is
> completed.
> As you said, Server2 will not be "up" until a successful fsck is performed.
> As I understood it, fsck could be performed only if no clients access
> the afr volume in some ways.
> So then, what would happen if Client2 is a service that continuously
> write something on the afr volume ? (like database insertions, or mail
> generation, or simply a real-time log of something..)
> Will Server2 never come up ?

this is a good point. using a divide and conquer approach i would
rephrase as: "each file" is not made accessible until it has been checkd
to be the latest version. while a file is being FSCK'ed other's access
to the file is blocked and once the FSCK is co mplete access to the
file continues. from the application's point of view it sees nothing,
and thinks that the particular read() or write() took long, that's all. 

re-using the underlying filesystem and working at the file/dir level
gives us this advantage of breaking up the problem into individual
files rather than dealing a 'big array of data blocks'. more
intelligent approaches are possible this way.

> Here is another scenario :
> Client1 and Client2 are accessing the AFR for different manners. (for
> instance, one is a mail service that writes everything that come through
> and the other is a database)
> Is there anything like a load-balancing process or are both clients
> writes on server1 which instantly replicates ?
> What happen if the link between the 2 servers is severed ?
> Let's say that Client1 can still access the volume through Server1,
> Client2 only through Server2, AND Server1 can't comunicate with Server2
> (what a worse-case-scenario !)

if you observe the spec files a bit carefully, you will oberve that
the servers *never* communicate with each other. there is NO
connection between server1 and server2. file replication featue is on
the client side (afr translator is loaded on the client spec file).
the client itself writes to both server1 and server2 simultaneously.

> And last but not least : let's now say that Client1 and Client2 run the
> same service (= access the same data). What would happen ? (Isn't that
> what you've called "split brain" ?)

two clients accessing the same data at the same time is perfectly
safe. I do not see any problem here. or probably i did not understand
your question correctly.

> I have another scenario, but I think it's enough for now, don't you ?

more feedback is always apprecitated, please shoot.


deep_thought (void)
  sleep (years2secs (7500000)); 
  return 42;

More information about the Gluster-devel mailing list