[Gluster-devel] Proposal to change locking in data-self-heal

Thu May 23 04:50:22 UTC 2013

On 5/22/13 9:45 PM, Pranith Kumar Karampuri wrote:
>
>
> ----- Original Message -----
>> From: "Anand Avati" <anand.avati at gmail.com>
>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>> Cc: "Jeff Darcy" <jdarcy at redhat.com>, "Anand Avati" <aavati at redhat.com>, "Raghavan Pichai" <rpichai at redhat.com>,
>> "Ravishankar Narayanankutty" <ranaraya at redhat.com>, "devel" <gluster-devel at nongnu.org>
>> Sent: Thursday, May 23, 2013 7:02:09 AM
>> Subject: Re: [Gluster-devel] Proposal to change locking in data-self-heal
>>
>> On Wed, May 22, 2013 at 5:57 AM, Pranith Kumar Karampuri <
>> pkarampu at redhat.com> wrote:
>>
>>>
>>>
>>> ----- Original Message -----
>>>> From: "Anand Avati" <anand.avati at gmail.com>
>>>> To: "Jeff Darcy" <jdarcy at redhat.com>
>>>> Cc: "Pranith Kumar Karampuri" <pkarampu at redhat.com>, "Anand Avati" <
>>> aavati at redhat.com>, "Raghavan Pichai"
>>>> <rpichai at redhat.com>, "Ravishankar Narayanankutty" <ranaraya at redhat.com>,
>>> "devel" <gluster-devel at nongnu.org>
>>>> Sent: Wednesday, May 22, 2013 1:19:19 AM
>>>> Subject: Re: [Gluster-devel] Proposal to change locking in data-self-heal
>>>>
>>>> On Tue, May 21, 2013 at 7:05 AM, Jeff Darcy <jdarcy at redhat.com> wrote:
>>>>
>>>>> On 05/21/2013 09:10 AM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>> scenario-1 won't happen because there exists a chance for it to
>>> acquire
>>>>>> truncate's full file lock after any 128k range sync happens.
>>>>>>
>>>>>> Scenario-2 won't happen because extra self-heals that are launched on
>>> the
>>>>>> same file will be blocked in self-heal-domain so the data-path's
>>> locks are
>>>>>> not affected by this.
>>>>>>
>>>>>
>>>>> This is two changes for two scenarios:
>>>>>
>>>>> * Change granular-lock order to avoid scenario 1.
>>>>>
>>>>> * Add a new lock to avoid scenario 2.
>>>>>
>>>>> I'm totally fine with the second part, but the first part makes me a
>>> bit
>>>>> uneasy.  As I recall, the "chained" locking (lock second range before
>>>>> unlocking the first) was a deliberate choice to avoid windows where
>>>>> somebody could jump in with a truncate during self-heal.  It certainly
>>>>> wasn't the obvious thing to do, suggesting there was a specific reason.
>>>>
>>>>
>>>> Chained locking really was to avoid the start of a second self-heal while
>>>> one is in progress. By giving up the big lock (after holding the small
>>>> lock), regional operations can progress but the second self-heal waiting
>>> to
>>>> hold the big lock is still held off. Truncating isn't really an issue as
>>>> long as the truncate transaction locks from new EOF till infinity (which
>>> it
>>>> is) and self-heal will not race in those regions. Of course, with chained
>>>> locking, full-file truncates can starve.
>>>>
>>>> It's not obvious what use case Pranith is facing where self-heal and
>>>> ftruncate() are competing. Is it just an artificial/hand-crafted test
>>> case,
>>>> or a real-life access pattern?
>>>
>>> artificial.
>>>
>>>>
>>>>
>>>>> Until we recall what that reason was I don't think we can determine
>>>>> whether this is a bug or a feature.  If it's a feature, or if we don't
>>>>> know, this change is likely to break something instead of fixing it.
>>>>
>>>>
>>>> The sole reason was to prevent the start of a second self-heal. Having
>>>> self-heals serialize in a separate domain solves the problem (except if
>>> we
>>>> are looking at runtime compatibility across multiple versions in this
>>>> scenario.. aargh!)
>>>
>>> So you guys are OK with this proposal if we solve version compatibility
>>> issues?
>>>
>>>
>> I guess you can even just ignore the backward compatibility issue in this
>> case. Old servers will "safely" block themselves against the big data-lock
>> causing starvation just the way things work today. Once all servers are
>> upgraded, all self-heals will gracefully block themselves in the new
>> domain. There isn't much compatibility handling actually.
>
>     If afr with new locking approach triggers self-heal first then old afr can still get
> the full-file lock. This is because new locking approach unlocks the range-lock without acquiring
> next range-lock. So 2 self-heals will be in progress in parallel. This will lead to incorrect
> extended attributes because of double decrements of changelogs one by the new afr self-heal
> and the other with old afr self-heal. Let me know if I am missing something.
>
>     Since scenario-1(the one with truncate) is not widely seen, may be we should just solve
> scenario-2 which can be done by acquiring a full-lock in self-heal domain then unlock after
> completion of self-heal, without changing locking in data-domain.
>

Yes, that's what I intended to say. Let's not introduce a backward 
compatibility "problem" for an artificially created scenario.

Avati