[Gluster-devel] Handling locks in NSR

Avra Sengupta asengupt at redhat.com
Thu May 5 12:14:09 UTC 2016


Hi,

I have sent a patch(http://review.gluster.org/#/c/14226/1) in accordance 
to lock/unlock fops in jbr-server and the discussion we had below. 
Please feel free to review the same. Thanks.

Regards,
Avra

On 03/03/2016 12:21 PM, Avra Sengupta wrote:
> On 03/03/2016 02:29 AM, Shyam wrote:
>> On 03/02/2016 03:10 AM, Avra Sengupta wrote:
>>> Hi,
>>>
>>> All fops in NSR, follow a specific workflow as described in this
>>> UML(https://docs.google.com/presentation/d/1lxwox72n6ovfOwzmdlNCZBJ5vQcCaONvZva0aLWKUqk/edit?usp=sharing). 
>>>
>>> However all locking fops will follow a slightly different workflow as
>>> described below. This is a first proposed draft for handling locks, and
>>> we would like to hear your concerns and queries regarding the same.
>>
>> This change, to handle locking FOPs differently, is due to what 
>> limitation/problem? (apologies if I missed an earlier thread on the 
>> same)
>>
>> My understanding is that this is due to the fact that the actual FOP 
>> could fail/block (non-blocking/blocking) as there is an existing lock 
>> held, and hence just adding a journal entry and meeting quorum, is 
>> not sufficient for the success of the FOP (it is necessary though to 
>> handle lock preservation in the event of leadership change), rather 
>> acquiring the lock is. Is this understanding right?
> Yes it is right, the change in approach for handling locks is to avoid 
> getting into a deadlock amogst the followers.
>>
>> Based on the above understanding of mine, and the discussion below, 
>> the intention seems to be to place the locking xlator below the 
>> journal. What if we place this xlator above the journal, but add 
>> requirements that FOPs handled by this xlator needs to reach the 
>> journal?
>>
>> Assuming we adopt this strategy (i.e the locks xlator is above the 
>> journal xlator), a successful lock acquisition by the locks xlator is 
>> not enough to guarantee that the lock is preserved across the replica 
>> group, hence it has to reach the journal and as a result pass through 
>> other replica members journal and locks xlators as well.
>>
>> If we do the above, what are the advantages and repercussions of the 
>> same?
> Why would we want to put the locking xlator above the journal. Is 
> there a use case for that?
> Firstly, we would have to modify the locking xlator to make it pass 
> through.
> We would also introduce a small window where we perform the lock 
> successfully, but have a failure on the journal. We would then have to 
> release the lock because we failed to journal it. In the previous 
> approach, if we fail to journal it, we wouldn't even go to the locking 
> xlator. Logically it makes the locking xlator dependent on the 
> journal's output, whereas ideally the journal should be dependent on 
> the locking xlator's output.
>>
>> Some of the points noted here (like conflicting non-blocking locks 
>> when the previous lock is not yet released) could be handled. Also in 
>> your scheme, what happens to blocking lock requests, the FOP will 
>> block, there is no async return to handle the success/failure of the 
>> same.
> Yes the FOP will block on blocking lock requests. I assume that's the 
> behaviour today. Please correct me if I am wrong.
>>
>> The downside is that on reconciliation we need to, potentially, undo 
>> some of the locks that are held by the locks xlator (in the new 
>> leader), which is outside the scope of the journal xlator.
> Yes we need to do lock cleanup on reconciliation, which is anyways 
> outside the scope of the journal xlator. The reconciliation daemon 
> will compare the terms on each replica node, and either acquire or 
> release locks accordingly.
>>
>>
>> I also assume we need to do the same for the leases xlator as well, 
>> right?
> Yes, as long as we handle locking properly leases xlators shouldn't be 
> a problem.
>>
>>
>>>
>>> 1. On receiving the lock, the leader will Journal the lock himself, and
>>> then try to actually acquire the lock. At this point in time, if it
>>> fails to acquire the lock, then it will invalidate the journal entry,
>>> and return a -ve ack back to the client. However, if it is 
>>> successful in
>>> acquiring the lock, it will mark the journal entry as complete, and
>>> forward the fop to the followers.
>>>
>>> 2. The followers on receiving the fop, will journal it, and then try to
>>> actually acquire the lock. If it fails to acquire the lock, then it 
>>> will
>>> invalidate the journal entry, and return a -ve ack back to the leader.
>>> If it is successful in acquiring the lock, it will mark the journal
>>> entry as complete,and send a +ve ack to the leader.
>>>
>>> 3. The leader on receiving all acks, will perform a quorum check. If
>>> quorum meets, it will send a +ve ack to the client. If the quorum 
>>> fails,
>>> it will send a rollback to the followers.
>>>
>>> 4. The followers on receiving the rollback, will journal it first, and
>>> then release the acquired lock. It will update the rollback entry in 
>>> the
>>> journal as complete and send an ack to the leader.
>>>
>>> 5. The leader on receiving the rollback acks, will journal it's own
>>> rollback, and then release the acquired lock. It will update the
>>> rollback entry in the journal, and send a -ve ack to the client.
>>>
>>> Few things to be noted in the above workflow are:
>>> 1. It will be a synchronous operation, across the replica volume.
>>> 2. Reconciliation will take care of nodes who have missed out the 
>>> locks.
>>> 3. On a client disconnect, there will be a lock-timeout on whose
>>> expiration all locks held by that particular client will be released.
>>>
>>> Regards,
>>> Avra
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel



More information about the Gluster-devel mailing list