[Gluster-devel] Replace cluster wide gluster locks with volume wide locks

Thu Sep 12 07:48:58 UTC 2013

On Thu, Sep 12, 2013 at 12:10 PM, Varun Shastry <vshastry at redhat.com> wrote:
>
> On Thursday 12 September 2013 08:41 AM, Krishnan Parthasarathi wrote:
>>
>>
>> ----- Original Message -----
>>>
>>> Hi,
>>>
>>> As of today most gluster commands take a cluster wide lock, before
>>> performing
>>> their respective operations. As a result
>>> any two gluster commands, which have no interdependency with each other,
>>> can't be executed simultaneously. To
>>> remove this interdependency we propose to replace this cluster wide lock
>>> with
>>> a volume specific lock, so that two
>>> operations on two different volumes can be performed simultaneously.
>>>
>>> By implementing this volume wide lock, our agenda is to achieve the
>>> following:
>>> 1. Allowing simultaneous volume operations on different volumes.
>>> Performing
>>> two operations simultaneously on the same
>>>     volume should not be allowed.
>>> 2. While a volume is being created, or deleted, operations(like
>>> rebalance,
>>> geo-rep) should be permitted on other volumes.
>>
>> You can't meaningfully interleave a volume-delete and a volume-rebalance
>> on the
>> same volume.
>> The locking domains must be arranged such that, create/delete operations
>> take a lock which conflict all locks held on any volume. This would
>> handle the scenario mentioned above.
>>
>> The volume operations performed in the cluster can be classified into the
>> following
>> categories,
>> - Create/Delete - Need a cluster-wide lock. This locks 'trumps' all other
>> locks
>>
>> - Add-brick, remove-brick, replace-brick, rebalance, volume-set/reset and
>> a whole bunch of features
>>    associated with a volume like quota, geo-rep etc - Need a volume-level
>> lock.
>
> Since these operations (add brick, remove brick, set-reset) modify the nfs
> volfile, a single volfile for all the volumes, don't we need to take the
> cluster-wide lock for the above set of operations also?
>

We did discuss this when we were discussing the design before it was
posted to the list and we don't think it'll be a problem. As the
generation of these volfiles take all the volumes into consideration,
the last operation, among concurrent volume operations, to generate
these volfiles will generate the correct volfile. If another operation
which requires volfiles regeneration happens when these are being
generated, it won't be a problem either as the operation will
regenrate the volfiles.

> - Varun Shastry
>
>>
>> - Volume-info, volume-status - Need no locking.
>>
>>
>>> 3. Two simultaneous volume create or volume delete operations should not
>>> be
>>> permitted.
>>>
>>> We propose to do this in two steps:
>>>
>>> 1. Implementing the volume wide lock: In order to implement this, we will
>>> add
>>> a lock consisting of the uuid of the
>>>     originator, to the in-memory volinfo(of that particular volume), in
>>> all
>>>     the nodes of the cluster. Once this lock is
>>>     taken, any other command for the same volume, will not be able to
>>> acquire
>>>     this lock from that particular volume's
>>>     volinfo. Meanwhile other operations on other volumes can still be
>>>     executed.
>>
>> There is one caveat with storing the volume-level lock in volinfo object.
>> All the nodes are not guaranteed to have an up-to-date version of the
>> volinfo
>> object. We don't know have the necessary mechanism to select the peers
>> based
>> on their recency of volume information. Worse still, the volinfo object
>> could be
>> modified on incoming updates from other peers in the cluster,
>> if and when they rejoin (the available partition of) the cluster, in the
>> middle of
>> a transaction. I agree this lack of guarantee is part of a different
>> problem,
>> but this is a runtime reality :(
>>
>> This prompts me to think that we should keep all the lock related
>> book-keeping
>> independent of things that are not guaranteed to stick around, for the
>> lifetime
>> of a command. This way we can keep the policy of locking (ie. volume-wide
>> Vs cluster-wide)
>> separate from the mechanism.
>>
>> cheers,
>> krish
>>
>>> 2. Stop using the cluster lock for existing commands: Port existing
>>> commands
>>> to use this framework. We will use
>>>     op-version to take care of backward compatibility for the existing
>>>     commands. We need to take care of commands like
>>>     volume create, volume delete, rebalance callbacks, implicit volume
>>> syncs
>>>     (when a node comes up), the volume sync
>>>     command which modify the priv->volumes, and also other non-volume
>>>     operations which work inside the gambit of the
>>>     cluster locks today while implementing this.
>>>     Please feel free to provide feedback.
>>>
>>> Regards,
>>> Avra
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at nongnu.org
>>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel