[Gluster-devel] A healing translator
anand.avati at gmail.com
Tue May 22 00:11:47 UTC 2012
On Tue, May 8, 2012 at 2:34 AM, Xavier Hernandez <xhernandez at datalab.es>wrote:
> Hello developers,
> I would like to expose some ideas we are working on to create a new kind
> of translator that should be able to unify and simplify to some extent the
> healing procedures of complex translators.
> Currently, the only translator with complex healing capabilities that we
> are aware of is AFR. We are developing another translator that will also
> need healing capabilities, so we thought that it would be interesting to
> create a new translator able to handle the common part of the healing
> process and hence to simplify and avoid duplicated code in other
> The basic idea of the new translator is to handle healing tasks nearer the
> storage translator on the server nodes instead to control everything from a
> translator on the client nodes. Of course the heal translator is not able
> to handle healing entirely by itself, it needs a client translator which
> will coordinate all tasks. The heal translator is intended to be used by
> translators that work with multiple subvolumes.
> I will try to explain how it works without entering into too much details.
> There is an important requisite for all client translators that use
> healing: they must have exactly the same list of subvolumes and in the same
> order. Currently, I think this is not a problem.
> The heal translator treats each file as an independent entity, and each
> one can be in 3 modes:
> 1. Normal mode
> This is the normal mode for a copy or fragment of a file when it is
> synchronized and consistent with the same file on other nodes (for example
> with other replicas. It is the client translator who decides if it is
> synchronized or not).
> 2. Healing mode
> This is the mode used when a client detects an inconsistency in the copy
> or fragment of the file stored on this node and initiates the healing
> 3. Provider mode (I don't like very much this name, though)
> This is the mode used by client translators when an inconsistency is
> detected in this file, but the copy or fragment stored in this node is
> considered good and it will be used as a source to repair the contents of
> this file on other nodes.
> Initially, when a file is created, it is set in normal mode. Client
> translators that make changes must guarantee that they send the
> modification requests in the same order to all the servers. This should be
> done using inodelk/entrylk.
> When a change is sent to a server, the client must include a bitmap mask
> of the clients to which the request is being sent. Normally this is a
> bitmap containing all the clients, however, when a server fails for some
> reason some bits will be cleared. The heal translator uses this bitmap to
> early detect failures on other nodes from the point of view of each client.
> When this condition is detected, the request is aborted with an error and
> the client is notified with the remaining list of valid nodes. If the
> client considers the request can be successfully server with the remaining
> list of nodes, it can resend the request with the updated bitmap.
> The heal translator also updates two file attributes for each change
> request to mantain the "version" of the data and metadata contents of the
> file. A similar task is currently made by AFR using xattrop. This would not
> be needed anymore, speeding write requests.
> The version of data and metadata is returned to the client for each read
> request, allowing it to detect inconsistent data.
> When a client detects an inconsistency, it initiates healing. First of
> all, it must lock the entry and inode (when necessary). Then, from the data
> collected from each node, it must decide which nodes have good data and
> which ones have bad data and hence need to be healed. There are two
> possible cases:
> 1. File is not a regular file
> In this case the reconstruction is very fast and requires few requests, so
> it is done while the file is locked. In this case, the heal translator does
> nothing relevant.
> 2. File is a regular file
> For regular files, the first step is to synchronize the metadata to the
> bad nodes, including the version information. Once this is done, the file
> is set in healing mode on bad nodes, and provider mode on good nodes. Then
> the entry and inode are unlocked.
> When a file is in provider mode, it works as in normal mode, but refuses
> to start another healing. Only one client can be healing a file.
> When a file is in healing mode, each normal write request from any client
> are handled as if the file were in normal mode, updating the version
> information and detecting possible inconsistencies with the bitmap.
> Additionally, the healing translator marks the written region of the file
> as "good".
> Each write request from the healing client intended to repair the file
> must be marked with a special flag. In this case, the area that wants to be
> written is filtered by the list of "good" ranges (if there are any
> intersection with a good range, it is removed from the request). The
> resulting set of ranges are propagated to the lower translator and added to
> the list of "good" ranges but the version information is not updated.
> Read requests are only served if the range requested is entirely contained
> into the "good" regions list.
> There are some additional details, but I think this is enough to have a
> general idea of its purpose and how it works.
> The main advantages of this translator are:
> 1. Avoid duplicated code in client translators
> 2. Simplify and unify healing methods in client translators
> 3. xattrop is not needed anymore in client translators to keep track of
> 4. Full file contents are repaired without locking the file
> 5. Better detection and prevention of some split brain situations as soon
> as possible
> I think it would be very useful. It seems to me that it works correctly in
> all situations, however I don't have all the experience that other
> developers have with the healing functions of AFR, so I will be happy to
> answer any question or suggestion to solve problems it may have or to
> improve it.
> What do you think about it ?
The goals you state above are all valid. What would really help (adoption)
is if you can implement this as a modification of AFR by utilizing all the
work already done, and you get brownie points if it is backward compatible
with existing AFR. If you already have any code in a publishable state,
please share it with us (github link?).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gluster-devel