[Gluster-devel] Locking behavior vs rmdir/unlink of a directory/file
rgowdapp at redhat.com
Thu Aug 20 04:54:46 UTC 2015
Most of the code currently treats inode table (and dentry structure associated with that) as the correct representative of underlying backend file-system. While this is correct for most of the cases, the representation might be out of sync for small time-windows (like file deleted on disk, but dentry and inode is not removed in our inode table etc). While working on locking directories in dht for better consistency we ran into one such issue. The issue is basically to make rmdir and directory creation during dht-selfheal mutually exclusive. The idea is to have a blocking inodelk on inode before proceeding with rmdir or directory self-heal. However, consider following scenario:
1. (dht_)rmdir acquires a lock.
2. lookup-selfheal tries to acquire a lock, but is blocked on lock acquired by rmdir.
3. rmdir deletes directory and unlocks the lock. Its possible for inode to remain in inode table and searchable through gfid till there is a positive reference count on it. In this case lock-request (by lookup) and granted-lock (to rmdir) makes the inode to remain in inode table even after rmdir.
4. lock request issued by lookup is granted.
Note that at step 4, its still possible rmdir might be in progress from dht perspective (it just completed on one node). However, this is precisely the situation we wanted to avoid i.e., we wanted to block and fail dht-selfheal instead of allowing it to proceed.
In this scenario at step 4, the directory is removed on backend file-system, but its representation is still present in inode table. We tried to solve this by doing a lookup on gfid before granting a lock . However, because of 
1. we no longer treat inode table as source of truth as opposed to other non-lookup code
2. performance hit in terms of a lookup on backend-filesystem for _every_ granted lock. This may not be as big considering that there is no network call involved.
There are other ways where dht could've avoided above scenario altogether with different trade-offs we didn't want to make. Few alternatives would've been,
1. use entrylk during lookup-selfheal and rmdir. This fits naturally as both are entry operations. However, dht-selfheal also sets layouts which should be synchronized other operations where we don't have name information. tl;dr we wanted to avoid using entrylk for reasons that are out of scope for this problem.
2. Use non-blocking inodelk by dht during lookup-selfheal. This solves the problem for most of the practical cases, but theoretically race can still exist.
To summarize, the problem of granted-locks and unlink/rmdir still remains and I am not sure what exactly should be the behavior of posix-locks in that scenario. Inputs in way of review on  are greatly appreciated.
More information about the Gluster-devel