[Gluster-devel] Locking behavior vs rmdir/unlink of a directory/file

Mon Aug 24 10:22:09 UTC 2015

On Thursday 20 August 2015 10:24 AM, Raghavendra Gowdappa wrote:
> Hi all,
>
> Most of the code currently treats inode table (and dentry structure associated with that) as the correct representative of underlying backend file-system. While this is correct for most of the cases, the representation might be out of sync for small time-windows (like file deleted on disk, but dentry and inode is not removed in our inode table etc). While working on locking directories in dht for better consistency we ran into one such issue. The issue is basically to make rmdir and directory creation during dht-selfheal mutually exclusive. The idea is to have a blocking inodelk on inode before proceeding with rmdir or directory self-heal. However, consider following scenario:
>
> 1. (dht_)rmdir acquires a lock.
> 2. lookup-selfheal tries to acquire a lock, but is blocked on lock acquired by rmdir.
> 3. rmdir deletes directory and unlocks the lock. Its possible for inode to remain in inode table and searchable through gfid till there is a positive reference count on it. In this case lock-request (by lookup) and granted-lock (to rmdir) makes the inode to remain in inode table even after rmdir.
> 4. lock request issued by lookup is granted.
>
> Note that at step 4, its still possible rmdir might be in progress from dht perspective (it just completed on one node). However, this is precisely the situation we wanted to avoid i.e., we wanted to block and fail dht-selfheal instead of allowing it to proceed.
>
> In this scenario at step 4, the directory is removed on backend file-system, but its representation is still present in inode table. We tried to solve this by doing a lookup on gfid before granting a lock [1]. However, because of [1]
>
> 1. we no longer treat inode table as source of truth as opposed to other non-lookup code
> 2. performance hit in terms of a lookup on backend-filesystem for _every_ granted lock. This may not be as big considering that there is no network call involved.
>

Can we not mark the in memory inode as having been unlinked in 
posix_rmdir() and use this information to determine whether a lock 
request can be processed?

stat() calls can be significantly expensive if the disk seek times 
happen to be high. It would be better if we can avoid an additional 
stat() for every granted lock.

Regards,
Vijay