[Gluster-devel] Handling locks in NSR

Thu Mar 3 06:51:16 UTC 2016

On 03/03/2016 02:29 AM, Shyam wrote:
> On 03/02/2016 03:10 AM, Avra Sengupta wrote:
>> Hi,
>>
>> All fops in NSR, follow a specific workflow as described in this
>> UML(https://docs.google.com/presentation/d/1lxwox72n6ovfOwzmdlNCZBJ5vQcCaONvZva0aLWKUqk/edit?usp=sharing). 
>>
>> However all locking fops will follow a slightly different workflow as
>> described below. This is a first proposed draft for handling locks, and
>> we would like to hear your concerns and queries regarding the same.
>
> This change, to handle locking FOPs differently, is due to what 
> limitation/problem? (apologies if I missed an earlier thread on the same)
>
> My understanding is that this is due to the fact that the actual FOP 
> could fail/block (non-blocking/blocking) as there is an existing lock 
> held, and hence just adding a journal entry and meeting quorum, is not 
> sufficient for the success of the FOP (it is necessary though to 
> handle lock preservation in the event of leadership change), rather 
> acquiring the lock is. Is this understanding right?
Yes it is right, the change in approach for handling locks is to avoid 
getting into a deadlock amogst the followers.
>
> Based on the above understanding of mine, and the discussion below, 
> the intention seems to be to place the locking xlator below the 
> journal. What if we place this xlator above the journal, but add 
> requirements that FOPs handled by this xlator needs to reach the journal?
>
> Assuming we adopt this strategy (i.e the locks xlator is above the 
> journal xlator), a successful lock acquisition by the locks xlator is 
> not enough to guarantee that the lock is preserved across the replica 
> group, hence it has to reach the journal and as a result pass through 
> other replica members journal and locks xlators as well.
>
> If we do the above, what are the advantages and repercussions of the 
> same?
Why would we want to put the locking xlator above the journal. Is there 
a use case for that?
Firstly, we would have to modify the locking xlator to make it pass through.
We would also introduce a small window where we perform the lock 
successfully, but have a failure on the journal. We would then have to 
release the lock because we failed to journal it. In the previous 
approach, if we fail to journal it, we wouldn't even go to the locking 
xlator. Logically it makes the locking xlator dependent on the journal's 
output, whereas ideally the journal should be dependent on the locking 
xlator's output.
>
> Some of the points noted here (like conflicting non-blocking locks 
> when the previous lock is not yet released) could be handled. Also in 
> your scheme, what happens to blocking lock requests, the FOP will 
> block, there is no async return to handle the success/failure of the 
> same.
Yes the FOP will block on blocking lock requests. I assume that's the 
behaviour today. Please correct me if I am wrong.
>
> The downside is that on reconciliation we need to, potentially, undo 
> some of the locks that are held by the locks xlator (in the new 
> leader), which is outside the scope of the journal xlator.
Yes we need to do lock cleanup on reconciliation, which is anyways 
outside the scope of the journal xlator. The reconciliation daemon will 
compare the terms on each replica node, and either acquire or release 
locks accordingly.
>
>
> I also assume we need to do the same for the leases xlator as well, right?
Yes, as long as we handle locking properly leases xlators shouldn't be a 
problem.
>
>
>>
>> 1. On receiving the lock, the leader will Journal the lock himself, and
>> then try to actually acquire the lock. At this point in time, if it
>> fails to acquire the lock, then it will invalidate the journal entry,
>> and return a -ve ack back to the client. However, if it is successful in
>> acquiring the lock, it will mark the journal entry as complete, and
>> forward the fop to the followers.
>>
>> 2. The followers on receiving the fop, will journal it, and then try to
>> actually acquire the lock. If it fails to acquire the lock, then it will
>> invalidate the journal entry, and return a -ve ack back to the leader.
>> If it is successful in acquiring the lock, it will mark the journal
>> entry as complete,and send a +ve ack to the leader.
>>
>> 3. The leader on receiving all acks, will perform a quorum check. If
>> quorum meets, it will send a +ve ack to the client. If the quorum fails,
>> it will send a rollback to the followers.
>>
>> 4. The followers on receiving the rollback, will journal it first, and
>> then release the acquired lock. It will update the rollback entry in the
>> journal as complete and send an ack to the leader.
>>
>> 5. The leader on receiving the rollback acks, will journal it's own
>> rollback, and then release the acquired lock. It will update the
>> rollback entry in the journal, and send a -ve ack to the client.
>>
>> Few things to be noted in the above workflow are:
>> 1. It will be a synchronous operation, across the replica volume.
>> 2. Reconciliation will take care of nodes who have missed out the locks.
>> 3. On a client disconnect, there will be a lock-timeout on whose
>> expiration all locks held by that particular client will be released.
>>
>> Regards,
>> Avra
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel