[Gluster-devel] A healing translator

Tue May 22 08:51:22 UTC 2012

On 05/22/2012 09:48 AM, Anand Avati wrote:
>
>>
>     I've tried to understand how AFR works and, in some way, some of
>     the ideas have been taken from it. However it is very complex and
>     a lot of changes have been carried out in the master branch over
>     the latest months. It's hard for me to follow them while actively
>     working on my translator. Nevertheless, the main reason to take a
>     separate path was that AFR is strongly bound to replication (at
>     least from what I saw when I analyzed it more deeply. Maybe things
>     have changed now, but haven't had time to review them).
>
>
> Have you reviewed the proactive self-heal daemon (+ changelog indexing 
> translator) which is a potential functional replacement for what you 
> might be attempting?
>
> Avati
I must admit that I've read something about it but I haven't had time to 
explore it in detail.

If I understand it correctly, the self-heal daemon works as a client 
process but can be executed on server nodes. I suppose that multiple 
self-heal daemons can be running on different nodes. Then, each daemon 
detects invalid files (not sure exactly how) and replicates the changes 
from one good node to the bad nodes.

The problem is that in the translator I'm working on, the information is 
dispersed among multiple nodes, so there isn't a single server node that 
contains the whole data. To repair a node, data must be read from at 
least two other nodes (it depends on configuration). From what I've read 
from AFR and the self-healing daemon, it's not straightforward to adapt 
them to this mechanism because they would need to know a subset of nodes 
with consistent data, not only one. Each daemon would have to contact 
all other nodes, read data from each one, determine which ones are 
valid, rebuild the data and send it to the bad nodes. This means that 
the daemon will have to be as complex as the clients.

My impression (but I may be wrong) is that AFR and the self-healing 
daemon are closely bound to the replication schema, so it is very hard 
to try to use them for other purposes. The healing translator I'm 
writing tries to offer generic server side helpers for the healing 
process, but it is the client side who really manages the healing 
operation (though heavily simplified) and could use it to replicate 
data, to disperse data, or some other schema.

Xavi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20120522/c2d02f23/attachment-0003.html>