[Gluster-devel] Improving real world performance by moving files closer to their target workloads
Derek Price
derek at ximbiot.com
Tue May 20 01:35:19 UTC 2008
Gordan Bobic wrote:
> Derek Price wrote:
>
>> If all nodes do attempt to stay up to date with this information, then
>> if the node accepting a write goes incommunicado, the quorum can
>> simply and effectively roll back the transaction by revoking the
>> node's lock and rolling back its idea of the current version number of
>> the affective file or directory.
>
> And with that we're back to the journalling idea. If we have a per file
> journal (write-ahead log, if you will), then if the writing node fails
> during the write, when it's lock is expired, the other nodes can just
> roll back the transaction.
Just to further define the ideas of "timeout" and "expire", I think it
would be ideal if timeouts are only handled lazily. i.e. no roll back
or automatic lock release happens unless a new node requests a lock and
the node that holds the current lock is determined to be incommunicado
(a state determined, i think, based on an expire time for the lock the
node holds, reset each time new data is received from the node).
I'm not even sure how complex the "journal" needs to be here. Except in
the case of O_APPEND, a "rollback" could just mean granting a new node a
higher transaction number than the timed-out node held. When the downed
node comes back up, then the file on the node with the highest
transaction number wins.
O_APPEND isn't much different. Until a file gets mirrored beyond the
minimum threshold, the "latest good" transaction number (the newest file
version that exists on the minimum # of mirrors) will need to be
remembered. If an O_APPEND lock is needed and a write lock expires,
then the "previous good" transaction number is used to find a version of
the file to roll back to.
So, I think this means our "journal" is simply the "latest good"
transaction number and a sort of atomic property, where nodes will not
dispose of the "latest good" content of a given file beyond the minimum
threshold until a newer version has been confirmed to be mirrored across
the minimum number of nodes and becomes the new "latest good".
Derek
--
Derek R. Price
Solutions Architect
Ximbiot, LLC <http://ximbiot.com>
Get CVS and Subversion Support from Ximbiot!
v: +1 248.835.1260
f: +1 248.246.1176
More information about the Gluster-devel
mailing list