[Gluster-devel] Improving real world performance by moving files closer to their target workloads

Derek Price derek at ximbiot.com
Tue May 20 01:35:19 UTC 2008


Gordan Bobic wrote:
> Derek Price wrote:
> 
>> If all nodes do attempt to stay up to date with this information, then 
>> if the node accepting a write goes incommunicado, the quorum can 
>> simply and effectively roll back the transaction by revoking the 
>> node's lock and rolling back its idea of the current version number of 
>> the affective file or directory.
> 
> And with that we're back to the journalling idea. If we have a per file 
> journal (write-ahead log, if you will), then if the writing node fails 
> during the write, when it's lock is expired, the other nodes can just 
> roll back the transaction.

Just to further define the ideas of "timeout" and "expire", I think it 
would be ideal if timeouts are only handled lazily.  i.e. no roll back 
or automatic lock release happens unless a new node requests a lock and 
the node that holds the current lock is determined to be incommunicado 
(a state determined, i think, based on an expire time for the lock the 
node holds, reset each time new data is received from the node).

I'm not even sure how complex the "journal" needs to be here.  Except in 
the case of O_APPEND, a "rollback" could just mean granting a new node a 
higher transaction number than the timed-out node held.  When the downed 
node comes back up, then the file on the node with the highest 
transaction number wins.

O_APPEND isn't much different.  Until a file gets mirrored beyond the 
minimum threshold, the "latest good" transaction number (the newest file 
version that exists on the minimum # of mirrors) will need to be 
remembered.  If an O_APPEND lock is needed and a write lock expires, 
then the "previous good" transaction number is used to find a version of 
the file to roll back to.

So, I think this means our "journal" is simply the "latest good" 
transaction number and a sort of atomic property, where nodes will not 
dispose of the "latest good" content of a given file beyond the minimum 
threshold until a newer version has been confirmed to be mirrored across 
the minimum number of nodes and becomes the new "latest good".

Derek
-- 
Derek R. Price
Solutions Architect
Ximbiot, LLC <http://ximbiot.com>
Get CVS and Subversion Support from Ximbiot!

v: +1 248.835.1260
f: +1 248.246.1176





More information about the Gluster-devel mailing list