[Gluster-devel] Architecture advice

Mon Jan 12 19:49:20 UTC 2009

--- On Mon, 1/12/09, Gordan Bobic <gordan at bobich.net> wrote:
> 
> > ...
> > No need for fencing simply because you now use HA
> > translator. The assumption in this case is that the 
> > servers can still talk to each other but that one 
> > server's connection to the clients may have died.  
> 
> That means that 50% of the scope for failure will still
> wipe you out because you'll start splitbraining. Not the
> way forward at all. A fencing setup will at least preserve
> the data integrity. 

Fencing won't help either without cooperation, see below...

> The correct way to handle comms channel
> failure between client and server is to have bonded
> interfaces going via different physical paths. _ONLY_
> dealing with the situation where both servers are alive and
> connected to each other but we can only reach one due to an
> obscure failure somewhere in the network (e.g. a failed
> switch port or a failed NIC in the server) is a pretty
> half-arsed edge case.

Why is that the correct way?  There's nothing wrong with 
having "bonding" at the glusterfs protocol level, is 
there?  That is somewhat what the HA translator is, except 
that it is supposed to take care of some additional 
failures.  It is supposed to retransmit "in progress" 
operations that have not succeeded because of comm 
failures (I have yet to figure out where in the code 
this happens though).

> Why re-invent the wheel when the tools to deal with these
> failure modes already exist?

Are you referring to bonding here? If so, see above 
why HA may be better (or additional benefit).

> > Any failures on the server side may still warrant a
> > fencing setup, but AFR is not yet setup to work 
> > cooperatively with a fencing setup.
> 
> It doesn't have to be. If one server in AFR dies
> nothing spectacular happens. Things time out and carry on. I
> don't see what cooperation there would need to be. RHCS
> does it's own heart-beating and fencing. Mix and match
> as required.

Yes, if a server goes down you are fine (aside from the
scenario where the other server then goes down followed
by the first one coming back up).  But, if you are using
the HA translator above and the communication goes down
between the two servers you may still get split brain 
(thus the need for heartbeat/fencing).

But, even with the current write logging in AFR, there 
are possible split brain scenarios which can not be 
avoided even with heartbeat/fencing (yet).  Anytime two 
different clients try to write to the same area of the 
filessystem and the network is segregated, there is a 
chance that they each succeed and fail on opposite 
servers causing split brain.  There is nothing heartbeat 
can do about this except attempt to mitigate the problem 
by intervening.  But heartbeat has no hooks to know when 
this happens so by the time heartbeat intervenes, 
"half writes" to each server may have occurred that 
cannot be undone.  That is the reason you really need 
cooperation between AFR and some other tool (such as 
heartbeat).  AFR needs to be able write all or nothing 
to all servers until some external policy machine 
(such as heartbeat) decides that it is safe (because 
of fencing or other mechanism) to proceed writing to 
only a portion of the subvolumes (servers).  Without 
this I don't see how you can prevent split brain?

Cheers,

-Martin