[Gluster-devel] Handling locks in NSR

Wed Mar 2 10:44:14 UTC 2016

----- Original Message -----
> From: "Atin Mukherjee" <atin.mukherjee83 at gmail.com>
> To: "Avra Sengupta" <asengupt at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Wednesday, March 2, 2016 4:03:11 PM
> Subject: Re: [Gluster-devel] Handling locks in NSR
> 
> 
> 
> 
> 
> -Atin
> Sent from one plus one
> On 02-Mar-2016 3:41 pm, "Avra Sengupta" < asengupt at redhat.com > wrote:
> > 
> > On 03/02/2016 02:55 PM, Venky Shankar wrote:
> >> 
> >> On Wed, Mar 02, 2016 at 02:29:26PM +0530, Avra Sengupta wrote:
> >>> 
> >>> On 03/02/2016 02:02 PM, Venky Shankar wrote:
> >>>> 
> >>>> On Wed, Mar 02, 2016 at 01:40:08PM +0530, Avra Sengupta wrote:
> >>>>> 
> >>>>> Hi,
> >>>>> 
> >>>>> All fops in NSR, follow a specific workflow as described in this UML(
> >>>>> https://docs.google.com/presentation/d/1lxwox72n6ovfOwzmdlNCZBJ5vQcCaONvZva0aLWKUqk/edit?usp=sharing
> >>>>> ).
> >>>>> However all locking fops will follow a slightly different workflow as
> >>>>> described below. This is a first proposed draft for handling locks, and
> >>>>> we
> >>>>> would like to hear your concerns and queries regarding the same.
> >>>>> 
> >>>>> 1. On receiving the lock, the leader will Journal the lock himself, and
> >>>>> then
> >>>>> try to actually acquire the lock. At this point in time, if it fails to
> >>>>> acquire the lock, then it will invalidate the journal entry, and return
> >>>>> a
> >>>>> -ve ack back to the client. However, if it is successful in acquiring
> >>>>> the
> >>>>> lock, it will mark the journal entry as complete, and forward the fop
> >>>>> to the
> >>>>> followers.
> >>>> 
> >>>> So, does a contending non-blocking lock operation check only on the
> >>>> leader
> >>>> since the followers might have not yet ack'd an earlier lock operation?
> >>> 
> >>> A non-blocking lock follows the same work flow, and thereby checks on the
> >>> leader first. In this case, it would be blocked on the leader, till the
> >>> leader releases the lock. Then it will follow the same workflow.
> >> 
> >> A non-blocking lock should ideally return EAGAIN if the region is already
> >> locked.
> >> Checking just on the leader (posix/locks on the leader server stack) and
> >> returning
> >> EAGAIN is kind of incomplete as the earlier lock request might not have
> >> been granted
> >> (due to failure on followers).
> >> 
> >> or does it even matter if we return EAGAIN during the transient state?
> >> 
> >> We could block the lock on the leader until an earlier lock request is
> >> satisfied
> >> (in which case return EAGAIN) or in case of failure try to satisfy the
> >> lock request.
> > 
> > That is what I said, it will be blocked on the leader till the leader
> > releases the already held lock.
> > 
> >> 
> >>>>> 2. The followers on receiving the fop, will journal it, and then try to
> >>>>> actually acquire the lock. If it fails to acquire the lock, then it
> >>>>> will
> >>>>> invalidate the journal entry, and return a -ve ack back to the leader.
> >>>>> If it
> >>>>> is successful in acquiring the lock, it will mark the journal entry as
> >>>>> complete,and send a +ve ack to the leader.
> >>>>> 
> >>>>> 3. The leader on receiving all acks, will perform a quorum check. If
> >>>>> quorum
> >>>>> meets, it will send a +ve ack to the client. If the quorum fails, it
> >>>>> will
> >>>>> send a rollback to the followers.
> >>>>> 
> >>>>> 4. The followers on receiving the rollback, will journal it first, and
> >>>>> then
> >>>>> release the acquired lock. It will update the rollback entry in the
> >>>>> journal
> >>>>> as complete and send an ack to the leader.
> >>>> 
> >>>> What happens if the rollback fails for whatever reason?
> >>> 
> >>> The leader receives a -ve rollback ack, but there's little it can do
> >>> about
> >>> it. Depending on the failure, it will be resolved during reconciliation
> >>>>> 
> >>>>> 5. The leader on receiving the rollback acks, will journal it's own
> >>>>> rollback, and then release the acquired lock. It will update the
> >>>>> rollback
> >>>>> entry in the journal, and send a -ve ack to the client.
> >>>>> 
> >>>>> Few things to be noted in the above workflow are:
> >>>>> 1. It will be a synchronous operation, across the replica volume.
> > 
> > Atin, I am not sure how AFR handles it.
> If AFR/EC handle them asynchronously do you see any performance bottleneck
> with NSR for this case?

I assume when you say synchronous it means you will send lock request to
leader and when it succeed then only you will send to other followers. And
the request to followers will not be serialized.

AFAIK, AFR sends non-blocking lock request asynchronously and blocking locks
in a serialized way.

> > 
> >>>>> 2. Reconciliation will take care of nodes who have missed out the
> >>>>> locks.

Will the bricks be available before the reconciliation is complete?

> >>>>> 3. On a client disconnect, there will be a lock-timeout on whose
> >>>>> expiration
> >>>>> all locks held by that particular client will be released.

I have not gone through NSR design in detail. Does the client is connected
to all the bricks or just the leader? Either way what will happen if the 
leader is still connected and the client sees network partition to one
of the followers?

> >>>>> 
> >>>>> Regards,
> >>>>> Avra
> >>>>> _______________________________________________
> >>>>> Gluster-devel mailing list
> >>>>> Gluster-devel at gluster.org
> >>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> > 
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel