[Gluster-devel] Proposal to change locking in data-self-heal

Wed May 22 10:36:48 UTC 2013

Maybe a different approach could solve some of these problems and 
improve responsiveness. It's an architectural change so I'm not sure if 
it's the right moment to discuss it, but at least it could be considered 
for the future. There are a lot of details to consider, so do not take 
this as a full explanation, only a high lever overview.

The basic change is to implement a server-side healing helper (HH) 
xlator living just under the lock xlator. It's purpose is not to heal 
the file but to offer functionalities to aid client-side xlators to heal 
a file.

When a client wants to heal a file, it will first send a request to the 
HH xlator to request healing access. If the file is not being healed by 
another client, the access will be granted. Once one client have 
exclusive access to heal the file, a full inode lock will be needed to 
heal the metadata at the beginning and the end of the heal process (just 
like it's currently done). Then all locks are removed and the data 
recovery can be made without any lock.

To be able to heal data without locks, the HH xlator needs to keep a 
list of pending segments to heal. Initially the segment will go from 
offset 0 to the file size (or something else defined by the client). 
Since the HH xlator is below the lock xlator, it can only receive one 
normal write and, possibly, one heal write at any moment. Normal writes 
will always take precedence and the written segment will be removed from 
the healing segments. Any heal write will be filtered by the pending 
segments: if a heal write tries to modify an area not covered by the 
pending segments, that area is not updated.

This strategy allows concurrent write operations with healing.

In this situation it's easy to handle a truncate request: the HH xlator 
intercepts it and updates the pending segments, excluding any segment 
starting at the truncate offset. If this results in an empty segment, 
the HH xlator will tell the healing client that the healing is complete.

Al 21/05/13 15:58, En/na Jeff Darcy ha escrit:
> On 05/21/2013 09:30 AM, Stephan von Krawczynski wrote:
>> I am not quite sure if I understood the issue in full detail. But are 
>> you
>> saying that you "split up" the current self-healing file in 128K chunks
>> with locking/unlocking (over the network)? It sounds a bit like the 
>> locking
>> takes more (cpu) time than the self-healing of the data itself. I 
>> mean this
>> can be a 10 G link where a complete file could be healed in almost no 
>> time,
>> even if the file is quite big. Sure WAN is different, but I really would
>> like to have at least an option to drop the partial locking 
>> completely and
>> lock the full file instead.
>
> That's actually how it used to work, which led to many complaints from 
> users who would see stalls accessing large files (most often VM 
> images) over GigE while self-heal was in progress.  Many considered it 
> a show-stopper, and the current "granular self-heal" approach was 
> implemented to address it.  I'm not sure whether the old behavior is 
> still available as an option.  If not (which is what I suspect) then 
> you're correct that it might be worth considering as an enhancement.
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel