[Gluster-devel] Lock migration as a part of rebalance

Tue Dec 16 19:55:10 UTC 2014

This mail intends to present the lock migration across subvolumes 
problem and seek solutions/thoughts around the same, so any 
feedback/corrections are appreciated.

# Current state of file locks post file migration during rebalance
Currently when a file is migrated during rebalance, its lock information 
is not transferred over from the old subvol to the new subvol, that the 
file now resides on.

As further lock requests, post migration of the file, would now be sent 
to the new subvol, any potential lock conflicts would not be detected, 
until the locks are migrated over.

The term locks above can refer to the POSIX locks aquired using the FOP 
lk by consumers of the volume, or to the gluster internal(?) 
inode/dentry locks. For now we limit the discussion to the POSIX locks 
supported by the FOP lk.

# Other areas in gluster that migrate locks
Current scheme of migrating locks in gluster on graph switches, trigger 
an fd migration process that migrates the lock information from the old 
fd to the new fd. This is driven by the gluster client stack, protocol 
layer (FUSE, gfapi).

This is done using the (set/get)xattr call with the attr name, 
"trusted.glusterfs.lockinfo". Which in turn fetches the required key for 
the old fd, and migrates the lock from this old fd to new fd. IOW, there 
is very little information transferred as the locks are migrated across 
fds on the same subvolume and not across subvolumes.

Additionally locks that are in the blocked state, do not seem to be 
migrated (at least the function to do so in FUSE is empty 
(fuse_handle_blocked_locks), need to run a test case to confirm), or 
responded to with an error.

# High level solution requirements when migrating locks across subvols
1) Block/deny new lock acquisitions on the new subvol, till locks are 
migrated
   - So that new locks that have overlapping ranges to the older ones 
are not granted
   - Potentially return EINTR on such requests?
2) Ensure all _acquired_ locks from all clients are migrated first
   - So that if and when placing blocked lock requests, these really do 
block for previous reasons and are not granted now
3) Migrate blocked locks post acquired locks are migrated (in any order?)
     - OR, send back EINTR for the blocked locks

(When we have upcalls/delegations added as features, those would have 
similar requirements for migration across subvolumes)

# Potential processes that could migrate the locks and issues thereof
1) The rebalance process, that migrates the file can help with migrating 
the locks, which would not involve any clients to the gluster volume

Issues:
    - Lock information is fd specific, when migrating these locks, the 
clients need not have detected that the file is migrated, and hence 
opened an fd against the new subvol, which when missing, would make this 
form of migration a little more interesting
    - Lock information also has client connection specific pointer 
(client_t) that needs to be reassigned on the new subvol
    - Other subvol specific information, maintained in the lock, that 
needs to be migrated over will suffer the same limitations/solutions

Benefits:
    - Can lock out/block newer lock requests effectively
    - Need not _wait_ till all clients have registered that the file is 
under migration and/or migrated their locks

2) DHT xlator in each client could be held responsible to migrate its 
locks to the new subvolume

Issues:
    - Somehow need to let every client know that locks need to be 
migrated (upcall infrastructure?)
    - What if some client is not reachable at the given time?
    - Have to wait till all clients replay the locks

Benefits:
    - Hmmm... Nothing really, if we could do it by the rebalance process 
itself the solution maybe better.

# Overall thoughts
- We could/should return EINTR for blocked locks, in the case of a graph 
switch, and the case of a file migration, this would relieve the design 
of that particular complexity, and is a legal error to return from a 
flock/fcntl operation

- If we can extract and map out all relevant lock information across 
subvolumes, then having rebalance do this work seems like a good fit. 
Additionally this could serve as a good way to migrate upcall requests 
and state as well

Thoughts?

Shyam