[Gluster-devel] Locking behavior vs rmdir/unlink of a directory/file

Tue Aug 25 09:27:11 UTC 2015

On Tuesday 25 August 2015 12:33 PM, Raghavendra Gowdappa wrote:
>
>
> ----- Original Message -----
>> From: "Vijay Bellur" <vbellur at redhat.com>
>> To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>, "Gluster Devel" <gluster-devel at gluster.org>
>> Cc: "Sakshi Bansal" <sabansal at redhat.com>
>> Sent: Monday, August 24, 2015 3:52:09 PM
>> Subject: Re: [Gluster-devel] Locking behavior vs rmdir/unlink of a directory/file
>>
>> On Thursday 20 August 2015 10:24 AM, Raghavendra Gowdappa wrote:
>>> Hi all,
>>>
>>> Most of the code currently treats inode table (and dentry structure
>>> associated with that) as the correct representative of underlying backend
>>> file-system. While this is correct for most of the cases, the
>>> representation might be out of sync for small time-windows (like file
>>> deleted on disk, but dentry and inode is not removed in our inode table
>>> etc). While working on locking directories in dht for better consistency
>>> we ran into one such issue. The issue is basically to make rmdir and
>>> directory creation during dht-selfheal mutually exclusive. The idea is to
>>> have a blocking inodelk on inode before proceeding with rmdir or directory
>>> self-heal. However, consider following scenario:
>>>
>>> 1. (dht_)rmdir acquires a lock.
>>> 2. lookup-selfheal tries to acquire a lock, but is blocked on lock acquired
>>> by rmdir.
>>> 3. rmdir deletes directory and unlocks the lock. Its possible for inode to
>>> remain in inode table and searchable through gfid till there is a positive
>>> reference count on it. In this case lock-request (by lookup) and
>>> granted-lock (to rmdir) makes the inode to remain in inode table even
>>> after rmdir.
>>> 4. lock request issued by lookup is granted.
>>>
>>> Note that at step 4, its still possible rmdir might be in progress from dht
>>> perspective (it just completed on one node). However, this is precisely
>>> the situation we wanted to avoid i.e., we wanted to block and fail
>>> dht-selfheal instead of allowing it to proceed.
>>>
>>> In this scenario at step 4, the directory is removed on backend
>>> file-system, but its representation is still present in inode table. We
>>> tried to solve this by doing a lookup on gfid before granting a lock [1].
>>> However, because of [1]
>>>
>>> 1. we no longer treat inode table as source of truth as opposed to other
>>> non-lookup code
>>> 2. performance hit in terms of a lookup on backend-filesystem for _every_
>>> granted lock. This may not be as big considering that there is no network
>>> call involved.
>>>
>>
>> Can we not mark the in memory inode as having been unlinked in
>> posix_rmdir() and use this information to determine whether a lock
>> request can be processed?
>
> Yes. Nithya suggested the same. But this seemed like a hacky fix. Reason is:
>
> 1. Currently we don't really differentiate in inode management based on inode type. The code (dentry management, inode management) is agnostic to type. With this fix we are bringing such explicit differentiation. Note that only for directories we can have such a flag indicating that inode is removed from backend. Since, files have hardlinks it would be difficult (if not impossible, as unlink_cbk doesn't carry iatt to figure out whether current unlink is on last link). This makes the solution only applicable for directories. For lock requests on files we still need to lookup on the backend (for our use-case this is fine, since we are not locking on files). Not a show-stopper, but something in terms of aesthetics.

On the contrary, the idea of doing a lookup from locks translator looks 
like a hack to me. Locks is designed to work on an in memory data 
structure for synchronization and it should not be looking beyond that 
to make decisions. With the current inode management code, we have cases 
in inode_link () where we look at ia_type of an inode for making decisions.

I had an offline conversation with Raghavendra and the core problem 
seems to be stemming due to an expectation in dht that a path exists if 
an INODELK is granted. As our inode state may not exactly match the on 
disk state, we will evaluate the possibility of conditionally altering 
INODELK semantics to guarantee path existence. This way all granted 
INODELK requests would not need to perform additional stat calls. If we 
go down this route, we should document this behavior so that 
expectations from INODELK is clear.

>
> The whole thing about locks during/after file/directory is removed seems to be not well defined as of now IMHO.
>       a. We can acquire lock because inode exists in inode-table.
>       b. inode-exists in inode-table because there are some locks on inode holding reference.
>
>     Of course, this situation will be fixed with maintaining a flag indicating file/directory removal (or by doing a lookup). But some clarifications needed - If we are going to have some information indicating file/directory is removed, what should be the behaviour of future lock/unlock calls? should we fail them? For lock calls, we can fail them. But for unlock there are two choices:
>       a. Let the consumer send an unlock even after remove
>       b. Or clear out the locks during unlink/rmdir.
>
>      I prefer approach a. here. Comments?
>>

Yes, approach a. would be desirable. The onus should usually be on the 
consumer to clean up resources acquired.

-Vijay