[Gluster-devel] Lock migration as a part of rebalance

Wed Dec 17 07:15:43 UTC 2014

On Wed, Dec 17, 2014 at 1:25 AM, Shyam <srangana at redhat.com> wrote:
>
> This mail intends to present the lock migration across subvolumes problem
> and seek solutions/thoughts around the same, so any feedback/corrections
> are appreciated.
>
> # Current state of file locks post file migration during rebalance
> Currently when a file is migrated during rebalance, its lock information
> is not transferred over from the old subvol to the new subvol, that the
> file now resides on.
>
> As further lock requests, post migration of the file, would now be sent to
> the new subvol, any potential lock conflicts would not be detected, until
> the locks are migrated over.
>
> The term locks above can refer to the POSIX locks aquired using the FOP lk
> by consumers of the volume, or to the gluster internal(?) inode/dentry
> locks. For now we limit the discussion to the POSIX locks supported by the
> FOP lk.
>
> # Other areas in gluster that migrate locks
> Current scheme of migrating locks in gluster on graph switches, trigger an
> fd migration process that migrates the lock information from the old fd to
> the new fd. This is driven by the gluster client stack, protocol layer
> (FUSE, gfapi).
>
> This is done using the (set/get)xattr call with the attr name,
> "trusted.glusterfs.lockinfo". Which in turn fetches the required key for
> the old fd, and migrates the lock from this old fd to new fd. IOW, there is
> very little information transferred as the locks are migrated across fds on
> the same subvolume and not across subvolumes.
>
> Additionally locks that are in the blocked state, do not seem to be
> migrated (at least the function to do so in FUSE is empty
> (fuse_handle_blocked_locks), need to run a test case to confirm), or
> responded to with an error.
>
> # High level solution requirements when migrating locks across subvols
> 1) Block/deny new lock acquisitions on the new subvol, till locks are
> migrated
>   - So that new locks that have overlapping ranges to the older ones are
> not granted
>   - Potentially return EINTR on such requests?
> 2) Ensure all _acquired_ locks from all clients are migrated first
>   - So that if and when placing blocked lock requests, these really do
> block for previous reasons and are not granted now
> 3) Migrate blocked locks post acquired locks are migrated (in any order?)
>     - OR, send back EINTR for the blocked locks
>
> (When we have upcalls/delegations added as features, those would have
> similar requirements for migration across subvolumes)
>
> # Potential processes that could migrate the locks and issues thereof
> 1) The rebalance process, that migrates the file can help with migrating
> the locks, which would not involve any clients to the gluster volume
>
> Issues:
>    - Lock information is fd specific, when migrating these locks, the
> clients need not have detected that the file is migrated, and hence opened
> an fd against the new subvol, which when missing, would make this form of
> migration a little more interesting
>    - Lock information also has client connection specific pointer
> (client_t) that needs to be reassigned on the new subvol
>    - Other subvol specific information, maintained in the lock, that needs
> to be migrated over will suffer the same limitations/solutions
>

The tricky thing here is that rebalance process has no control over when
1. fd will be opened on dst-node, since clients open fd on dst-node
on-demand based on the I/O happening through them.
2. client establishes connection on dst-node (client might've been cut off
from dst-node).

Unless we've a global mapping (like a client can always be identified using
same uuid irrespective of the brick we are looking) this seems like a
difficult thing to achieve.

> Benefits:
>    - Can lock out/block newer lock requests effectively
>    - Need not _wait_ till all clients have registered that the file is
> under migration and/or migrated their locks
>
> 2) DHT xlator in each client could be held responsible to migrate its
> locks to the new subvolume
>
> Issues:
>    - Somehow need to let every client know that locks need to be migrated
> (upcall infrastructure?)
>    - What if some client is not reachable at the given time?
>    - Have to wait till all clients replay the locks
>
> Benefits:
>    - Hmmm... Nothing really, if we could do it by the rebalance process
> itself the solution maybe better.
>
> # Overall thoughts
> - We could/should return EINTR for blocked locks, in the case of a graph
> switch, and the case of a file migration, this would relieve the design of
> that particular complexity, and is a legal error to return from a
> flock/fcntl operation
>
> - If we can extract and map out all relevant lock information across
> subvolumes, then having rebalance do this work seems like a good fit.
> Additionally this could serve as a good way to migrate upcall requests and
> state as well
>
> Thoughts?
>
> Shyam
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>

-- 
Raghavendra G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20141217/980e7b80/attachment.html>