[Gluster-devel] Locking behavior vs rmdir/unlink of a directory/file

Thu Aug 20 05:01:55 UTC 2015

----- Original Message -----
> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> To: "Gluster Devel" <gluster-devel at gluster.org>
> Cc: "Sakshi Bansal" <sabansal at redhat.com>
> Sent: Thursday, August 20, 2015 10:24:46 AM
> Subject: [Gluster-devel] Locking behavior vs rmdir/unlink of a directory/file
> 
> Hi all,
> 
> Most of the code currently treats inode table (and dentry structure
> associated with that) as the correct representative of underlying backend
> file-system. While this is correct for most of the cases, the representation
> might be out of sync for small time-windows (like file deleted on disk, but
> dentry and inode is not removed in our inode table etc). While working on
> locking directories in dht for better consistency we ran into one such
> issue. The issue is basically to make rmdir and directory creation during
> dht-selfheal mutually exclusive. The idea is to have a blocking inodelk on
> inode before proceeding with rmdir or directory self-heal. However, consider
> following scenario:
> 
> 1. (dht_)rmdir acquires a lock.
> 2. lookup-selfheal tries to acquire a lock, but is blocked on lock acquired
> by rmdir.
> 3. rmdir deletes directory and unlocks the lock. Its possible for inode to
> remain in inode table and searchable through gfid till there is a positive
> reference count on it. In this case lock-request (by lookup) and
> granted-lock (to rmdir) makes the inode to remain in inode table even after
> rmdir.

as both of them have a refcount each on inode.

> 4. lock request issued by lookup is granted.
> 
> Note that at step 4, its still possible rmdir might be in progress from dht
> perspective (it just completed on one node). However, this is precisely the
> situation we wanted to avoid i.e., we wanted to block and fail dht-selfheal
> instead of allowing it to proceed.
> 
> In this scenario at step 4, the directory is removed on backend file-system,
> but its representation is still present in inode table. We tried to solve
> this by doing a lookup on gfid before granting a lock [1]. However, because
> of [1]
> 
> 1. we no longer treat inode table as source of truth as opposed to other
> non-lookup code
> 2. performance hit in terms of a lookup on backend-filesystem for _every_
> granted lock. This may not be as big considering that there is no network
> call involved.
> 
> There are other ways where dht could've avoided above scenario altogether
> with different trade-offs we didn't want to make. Few alternatives would've
> been,
> 1. use entrylk during lookup-selfheal and rmdir. This fits naturally as both
> are entry operations. However, dht-selfheal also sets layouts which should
> be synchronized other operations where we don't have name information. tl;dr
> we wanted to avoid using entrylk for reasons that are out of scope for this
> problem.
> 2. Use non-blocking inodelk by dht during lookup-selfheal. This solves the
> problem for most of the practical cases, but theoretically race can still
> exist.
> 
> To summarize, the problem of granted-locks and unlink/rmdir still remains and
> I am not sure what exactly should be the behavior of posix-locks in that
> scenario. Inputs in way of review on [1] are greatly appreciated.
> 
> [1] http://review.gluster.org/#/c/11916/
> 
> regards,
> Raghavendra.
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>