[Gluster-devel] Re; Load balancing ...

Wed Apr 30 01:20:21 UTC 2008

--- Krishna Srinivas <krishna at zresearch.com> wrote:

> * replay of the journaling will cause race
> conditions. (If we consider 2 or more
> clients, each client writes to same offset)

Access could be serialized if the journal were
abstracted out of AFR into a separate journal
translator and this layer were placed above the
subvolume which it journaled requiring all access to
the subvolume to go through the journaling layer. 
Would this serialization prevent the race conditions
you are describing?

This would now look like this:

   Client AFR    Client AFR
            |\  /|
            | \/ |
            |/  \|
     Journal A  Journal B
            |    |
        Sub A    Sub B

> A better solution would be to maintain a list of
> dirty blocks and use it during selfheal.

Agreed, but why not make it infinitely granular and
keep a list of dirty file spans instead of blocks? 
This should be extremely space efficient.

> >  In terms of work, I'm guessing each write
> operation would need to put an additional
> (serial,path,offset,bytes,data) to the journal
> volume .. 

Actually just (version, path, offset and bytes), the
'data' does not need to be put in the journal since it
is in the subvolume and can be recalled at any moment.

> each data volume would need to keep track
> of it's most recent serial, then mount would need to
> check the journal and run playbacks for each
> sub-volume who's serial isn't up to the most recent
> in the journal serial ...

I was envisioning that it would work similar to the
way it works now in that when AFR reads a file it
would ask the lower levels (which in this case are the
journal layers) what the latest version of the file in
each subvolume is and sync on a mismatch.

> >  If all this is done in a journal translator .. it
> doesn't "sound" too onerous or that it would involve
> changing any other code ... ??

Well, changes would have to be done in the journal
translator AND in the AFR translator in order to be
able to recall data from the journal.  Currently when
the AFR translator needs to request a file from node A
to heal node B it just needs to ask the subvolume for
the whole file.  With a journal translator AFR needs
to be smarter and ask for the changed sections of the
file  instead.  

But these specific AFR changes would need to be done
whether using a journal OR an rsync layer.  A separate
rsync layer would need to be created that looked
similar to my first diagram, like this:

   Client AFR    Client AFR
            |\  /|
            | \/ |
            |/  \|
       Rsync A  Rsync B
            |    |
        Sub A    Sub B

The rsync layer would need to reside on the subvolume
hosts and cannot be in the client AFR or you would be
trashing the network anyway.

> >  Splitbrain handling of this would be impossible,
> > and one version would always have to win. But
other
> > than that, I can see that would work.

Splitbrain with the journal should be exactly the same
as without it.

-Martin

      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ