[Gluster-devel] request for comments

Tue May 1 16:54:42 UTC 2007

That makes all the sense in the world to me to have replication on the server side.  I especially like the idea about network failover and not having to depend on client mounts to maintain consistency on the server side. 

Majied

On Tue, 1 May 2007 09:05:28 -0700
Anand Avati <avati at zresearch.com> wrote:

> 
> here is a design proposal about some changes to afr and related.
> currently AFR is totally handled on the client side, where the client
> does the replication as well as failover. the AFR translator
> essentially is doing _two_ features - 1. replication 2. failover. 
> 
> In view of the recent race condition discussed about AFR in the mailing
> list (two clients writing to the same region running into a race while
> writing to second mirror) and for other benefits mentioned below, the
> proposal is to split replication and failover into two seperate
> translators. replication is meant to be loaded on the server side
> while failover alone is meant to be loaded on the client side.
> 
> imagine grouping your storage cluster into pairs or triplets or
> quadriplets. the AFR translator will be loaded to form these groups,
> but on the server side. each memeber of the (say) triplet will load
> AFR with one child as the storage/posix and the other two children as
> protocol/clients for the auxillary export of the remaining two
> servers. thus the effect is,
> 
> * when you write to one server, it goes to all the three (redundancy)
> * and, you can write via any server (used for failover)
> 
> under normal situation, the failover at client uses 'primary child'
> (the non-auxillary export server) and opeartions are performed only on
> that child. the server side takes care of replication. when the server
> goes down failover detects broken link and uses the aux export.
> 
> advantages:
> 
> 1. since a file is replicated by a signle agent, no potential race
> conditions (most important)
> 
> 2. the failover abstraction works for nonAFR scenarios also. you can
> use the failover translator to failover between two network links to
> the same server. (generally use infiniband, but failover to gigabit
> totally seemlessly, even preserving open FDs)
> 
> 3. client writes to only one server, tremendous saving of bandwidth
> on the link between client and server.
> 
> 4. self-heal checks can be performed in a more deterministic manner
> since it is done by the 'primary chld' server. there are no
> questions like 'what if two children try to heal together' or 'what if
> no client is mounted at all'
> 
> 5. extensions to AFR (like very-lazy replication, on close()) will be
> lot easier. client submits a write to any server and forgets.
> 
> 6. possible to implment 'transaction replay' kind of features easier
> by preserving unwritten write() data with offset etc. on the server itslef
> (doing such things with AFR on the client is unreliable since client can
> always umount off)
> 
> 7. on client side failover is not the only way, even 'loadbalance'
> translator will be a good choice (wich takes care of not scheduling
> calls to the link which is down). thus AFR will work hand-in-hand with
> failover and/or loadbalancing, howoever the user prefers. (ofcourse
> the loadbalance will work with its own abstraction where you can use
> it just to loadbalance network links (remember somebody asking this on
> the mailing list))
> 
> my instinct tells me there are more advantages i can list if i think
> over more.
> 
> i feel failover and loadbalancer as generic layer will add lot of
> power and possiblity for creative use, and AFR leveraging on that fits
> in overall nicely.
> 
> suggestions/comments ?
> 
> 
> avati
> 
> -- 
> ultimate_answer_t
> deep_thought (void)
> { 
>   sleep (years2secs (7500000)); 
>   return 42;
> }
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel