[Gluster-devel] Replace cluster wide gluster locks with volume wide locks
Varun Shastry
vshastry at redhat.com
Thu Sep 12 06:40:21 UTC 2013
On Thursday 12 September 2013 08:41 AM, Krishnan Parthasarathi wrote:
>
> ----- Original Message -----
>> Hi,
>>
>> As of today most gluster commands take a cluster wide lock, before performing
>> their respective operations. As a result
>> any two gluster commands, which have no interdependency with each other,
>> can't be executed simultaneously. To
>> remove this interdependency we propose to replace this cluster wide lock with
>> a volume specific lock, so that two
>> operations on two different volumes can be performed simultaneously.
>>
>> By implementing this volume wide lock, our agenda is to achieve the
>> following:
>> 1. Allowing simultaneous volume operations on different volumes. Performing
>> two operations simultaneously on the same
>> volume should not be allowed.
>> 2. While a volume is being created, or deleted, operations(like rebalance,
>> geo-rep) should be permitted on other volumes.
> You can't meaningfully interleave a volume-delete and a volume-rebalance on the
> same volume.
> The locking domains must be arranged such that, create/delete operations
> take a lock which conflict all locks held on any volume. This would
> handle the scenario mentioned above.
>
> The volume operations performed in the cluster can be classified into the following
> categories,
> - Create/Delete - Need a cluster-wide lock. This locks 'trumps' all other locks
>
> - Add-brick, remove-brick, replace-brick, rebalance, volume-set/reset and a whole bunch of features
> associated with a volume like quota, geo-rep etc - Need a volume-level lock.
Since these operations (add brick, remove brick, set-reset) modify the
nfs volfile, a single volfile for all the volumes, don't we need to take
the cluster-wide lock for the above set of operations also?
- Varun Shastry
>
> - Volume-info, volume-status - Need no locking.
>
>
>> 3. Two simultaneous volume create or volume delete operations should not be
>> permitted.
>>
>> We propose to do this in two steps:
>>
>> 1. Implementing the volume wide lock: In order to implement this, we will add
>> a lock consisting of the uuid of the
>> originator, to the in-memory volinfo(of that particular volume), in all
>> the nodes of the cluster. Once this lock is
>> taken, any other command for the same volume, will not be able to acquire
>> this lock from that particular volume's
>> volinfo. Meanwhile other operations on other volumes can still be
>> executed.
> There is one caveat with storing the volume-level lock in volinfo object.
> All the nodes are not guaranteed to have an up-to-date version of the volinfo
> object. We don't know have the necessary mechanism to select the peers based
> on their recency of volume information. Worse still, the volinfo object could be
> modified on incoming updates from other peers in the cluster,
> if and when they rejoin (the available partition of) the cluster, in the middle of
> a transaction. I agree this lack of guarantee is part of a different problem,
> but this is a runtime reality :(
>
> This prompts me to think that we should keep all the lock related book-keeping
> independent of things that are not guaranteed to stick around, for the lifetime
> of a command. This way we can keep the policy of locking (ie. volume-wide Vs cluster-wide)
> separate from the mechanism.
>
> cheers,
> krish
>
>> 2. Stop using the cluster lock for existing commands: Port existing commands
>> to use this framework. We will use
>> op-version to take care of backward compatibility for the existing
>> commands. We need to take care of commands like
>> volume create, volume delete, rebalance callbacks, implicit volume syncs
>> (when a node comes up), the volume sync
>> command which modify the priv->volumes, and also other non-volume
>> operations which work inside the gambit of the
>> cluster locks today while implementing this.
>>
>> Please feel free to provide feedback.
>>
>> Regards,
>> Avra
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list