[Gluster-devel] Proposal to change locking in data-self-heal
Xavier Hernandez
xhernandez at datalab.es
Wed May 22 10:36:48 UTC 2013
Maybe a different approach could solve some of these problems and
improve responsiveness. It's an architectural change so I'm not sure if
it's the right moment to discuss it, but at least it could be considered
for the future. There are a lot of details to consider, so do not take
this as a full explanation, only a high lever overview.
The basic change is to implement a server-side healing helper (HH)
xlator living just under the lock xlator. It's purpose is not to heal
the file but to offer functionalities to aid client-side xlators to heal
a file.
When a client wants to heal a file, it will first send a request to the
HH xlator to request healing access. If the file is not being healed by
another client, the access will be granted. Once one client have
exclusive access to heal the file, a full inode lock will be needed to
heal the metadata at the beginning and the end of the heal process (just
like it's currently done). Then all locks are removed and the data
recovery can be made without any lock.
To be able to heal data without locks, the HH xlator needs to keep a
list of pending segments to heal. Initially the segment will go from
offset 0 to the file size (or something else defined by the client).
Since the HH xlator is below the lock xlator, it can only receive one
normal write and, possibly, one heal write at any moment. Normal writes
will always take precedence and the written segment will be removed from
the healing segments. Any heal write will be filtered by the pending
segments: if a heal write tries to modify an area not covered by the
pending segments, that area is not updated.
This strategy allows concurrent write operations with healing.
In this situation it's easy to handle a truncate request: the HH xlator
intercepts it and updates the pending segments, excluding any segment
starting at the truncate offset. If this results in an empty segment,
the HH xlator will tell the healing client that the healing is complete.
Al 21/05/13 15:58, En/na Jeff Darcy ha escrit:
> On 05/21/2013 09:30 AM, Stephan von Krawczynski wrote:
>> I am not quite sure if I understood the issue in full detail. But are
>> you
>> saying that you "split up" the current self-healing file in 128K chunks
>> with locking/unlocking (over the network)? It sounds a bit like the
>> locking
>> takes more (cpu) time than the self-healing of the data itself. I
>> mean this
>> can be a 10 G link where a complete file could be healed in almost no
>> time,
>> even if the file is quite big. Sure WAN is different, but I really would
>> like to have at least an option to drop the partial locking
>> completely and
>> lock the full file instead.
>
> That's actually how it used to work, which led to many complaints from
> users who would see stalls accessing large files (most often VM
> images) over GigE while self-heal was in progress. Many considered it
> a show-stopper, and the current "granular self-heal" approach was
> implemented to address it. I'm not sure whether the old behavior is
> still available as an option. If not (which is what I suspect) then
> you're correct that it might be worth considering as an enhancement.
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list