[Gluster-devel] Replace cluster wide gluster locks with volume wide locks

Thu Sep 12 03:11:57 UTC 2013

----- Original Message -----
> Hi,
> 
> As of today most gluster commands take a cluster wide lock, before performing
> their respective operations. As a result
> any two gluster commands, which have no interdependency with each other,
> can't be executed simultaneously. To
> remove this interdependency we propose to replace this cluster wide lock with
> a volume specific lock, so that two
> operations on two different volumes can be performed simultaneously.
> 
> By implementing this volume wide lock, our agenda is to achieve the
> following:
> 1. Allowing simultaneous volume operations on different volumes. Performing
> two operations simultaneously on the same
>    volume should not be allowed.
> 2. While a volume is being created, or deleted, operations(like rebalance,
> geo-rep) should be permitted on other volumes.

You can't meaningfully interleave a volume-delete and a volume-rebalance on the
same volume.
The locking domains must be arranged such that, create/delete operations
take a lock which conflict all locks held on any volume. This would 
handle the scenario mentioned above.

The volume operations performed in the cluster can be classified into the following
categories,
- Create/Delete - Need a cluster-wide lock. This locks 'trumps' all other locks

- Add-brick, remove-brick, replace-brick, rebalance, volume-set/reset and a whole bunch of features
  associated with a volume like quota, geo-rep etc - Need a volume-level lock. 

- Volume-info, volume-status - Need no locking.

> 3. Two simultaneous volume create or volume delete operations should not be
> permitted.
> 
> We propose to do this in two steps:
> 
> 1. Implementing the volume wide lock: In order to implement this, we will add
> a lock consisting of the uuid of the
>    originator, to the in-memory volinfo(of that particular volume), in all
>    the nodes of the cluster. Once this lock is
>    taken, any other command for the same volume, will not be able to acquire
>    this lock from that particular volume's
>    volinfo. Meanwhile other operations on other volumes can still be
>    executed.

There is one caveat with storing the volume-level lock in volinfo object.
All the nodes are not guaranteed to have an up-to-date version of the volinfo
object. We don't know have the necessary mechanism to select the peers based
on their recency of volume information. Worse still, the volinfo object could be
modified on incoming updates from other peers in the cluster,
if and when they rejoin (the available partition of) the cluster, in the middle of 
a transaction. I agree this lack of guarantee is part of a different problem,
but this is a runtime reality :(

This prompts me to think that we should keep all the lock related book-keeping 
independent of things that are not guaranteed to stick around, for the lifetime
of a command. This way we can keep the policy of locking (ie. volume-wide Vs cluster-wide)
separate from the mechanism.

cheers,
krish

> 
> 2. Stop using the cluster lock for existing commands: Port existing commands
> to use this framework. We will use
>    op-version to take care of backward compatibility for the existing
>    commands. We need to take care of commands like
>    volume create, volume delete, rebalance callbacks, implicit volume syncs
>    (when a node comes up), the volume sync
>    command which modify the priv->volumes, and also other non-volume
>    operations which work inside the gambit of the
>    cluster locks today while implementing this.
>    
> Please feel free to provide feedback.
> 
> Regards,
> Avra
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>