[Gluster-devel] Regarding doing away with refkeeper in locks xlator

Thu Jun 5 18:41:44 UTC 2014

On Thu, Jun 5, 2014 at 10:46 AM, Krutika Dhananjay <kdhananj at redhat.com>
wrote:

> To summarize, the real "problems" are:
>
> - Deref of pl_inode->refkeeper as an inode_t in the cleanup logger. It
> is an internal artifact of pl_update_refkeeper() working and nobody else
> is supposed to "touch" it.
>
> For this, the solution I have in mind is to
> a. have a placeholder for gfid in pl_inode_t object,
> b. remember the gfid of the inode at the time of pl_inode_t creation in
> pl_ctx_get(), and
> c. print pl_inode->gfid in the log message in
> pl_{inodelk,entrylk}_log_cleanup().
>

OK.

- Missing call to pl_update_refkeeper() in client_disconnect_cbk(). This
> can result in a potential leak of inode ref (not a leak if the same
> inode gets any locking activity by another client in the future).
>
> As far as updates to refkeeper is concerned during DISCONNECT is
> concerned, Pranith and I did have discussions at length but could
> not find a single safe place in cleanup functions to call
> pl_update_refkeeper() that does not lead to illegal memory access,
> whether before or after the call to __pl_inodelk_unref() in the third for
> loop.
>
> Case #1 explained:
> =============
>
> <snip>
> list_for_each_entry_safe() {
> ...
> ...
> pl_update_refkeeper(inode);
> pthread_mutex_lock (&pl_inode->mutex);
>      __pl_inodelk_unref(l);
> pthread_mutex_unlock (&pl_inode->mutex);
> </snip>
>
> This causes the last unref in pl_update_refkeeper() to implicitly call
> pl_forget() where pl_inode_t object is destroyed while it is
> still needed in terms of having to obtain pl_inode->mutex before unrefing
> the lock object.
>
> Case 2 explained:
> =============
> <snip>
> list_for_each_entry_safe() {
> ...
> ...
> pthread_mutex_lock (&pl_inode->mutex);
>      __pl_inodelk_unref(l);
> pthread_mutex_unlock (&pl_inode->mutex);
> pl_update_refkeeper(inode);
> </snip>
>
> After the first for loop is processed, the domain's lists could have been
> emptied. And at this point, there could well come a second thread that
> could update refkeeper,
> triggering last unref on the inode leading to a call to pl_forget() (where
> pl_inode_t is freed), after which the DISCONNECT thread would be trying to
> perform illegal
> memory access in its call to pl_update_refkeeper() during its turn.
>
> So the solution Pranith and I came up with involves making sure that the
> inode_t object is alive for as long as there is atleast one lock on it:
>
> 1. refkeeper will not be used for inodelks and entrylks (but will continue
> to be used in fcntl locks).
> 2. Each pl_inode_t object will also host an inode_t member which is
> initialised during a call to pl_inode_get() in pl_common_{inodelk,
> entrylks}().
> 3. Everytime a lock is granted/blocked, pl_inode->inode is ref'd (in
> pl_inode_setlk() after successful locking.
> 4. Everytime a lock is unlocked, pl_inode->inode is unref'd.
> 5. In disconnect codepath, as part of "released" list processing, the
> inodes associated with the client are unref'd in the same loop right after
> every lock object is unref'd.
>
> With all this, there is one problem with the lock object that is found to
> be in both blocked and granted list in the transient state, when there's a
> race between the unlocking thread
> and the disconnecting thread. This can be best explained with the
> following example:
> <example>
> Consider an inode I on which there's a granted lock G and a blocked lock
> B, from clients C1 and C2 respectively. With this approach, at this point
> the number of refs taken
> by the locks xlator on I would be 2.
> C1 unlocks B, at which point I's refcount becomes 1.
> Now as part of granting other blocked locks in unlock codepath, B is put
> in granted list.
> Now B's client C2 sends a DISCONNECT event, which would cause I to be
> unref'd again. This being the last unref, pl_forget() is called on I
> causing its pl_inode to be destroyed
> At this point, the unlocking thread could try to do a mutex lock on
> pl_inode->mutex in grant_blocked_{inode,entry}_locks() (whereas pl_inode is
> now already destroyed) leading to a crash.
> </example>
>
> The problem described in the example can be fixed by making sure
> grant_blocked_{inode,entry}_locks() refs the inode first thing and then
> unrefs it just before returning.
> This would fix illegal memory access.
>

This sounds a bit complicated. I think there is a much simpler solution:

- First, make update_refkeeper() check for blocked locks (which I mentioned
as "optional" previously)

- Make grant_blocked_locks() double up and do the job of update_refkeeper()
internally.

Something which looks like this:

diff --git a/xlators/features/locks/src/common.c
b/xlators/features/locks/src/common.c
index f6c71c1..38df385 100644
--- a/xlators/features/locks/src/common.c
+++ b/xlators/features/locks/src/common.c
@@ -126,8 +126,14 @@ __pl_inode_is_empty (pl_inode_t *pl_inode)
                 if (!list_empty (&dom->entrylk_list))
                         is_empty = 0;

+                if (!list_empty (&dom->blocked_entrylks))
+                        is_empty = 0;
+
                 if (!list_empty (&dom->inodelk_list))
                         is_empty = 0;
+
+                if (!list_empty (&dom->blocked_inodelks))
+                        is_empty = 0;
         }

         return is_empty;
@@ -944,12 +950,18 @@ grant_blocked_locks (xlator_t *this, pl_inode_t
*pl_inode)
         struct list_head granted_list;
         posix_lock_t     *tmp = NULL;
         posix_lock_t     *lock = NULL;
+       inode_t *unref = NULL;

         INIT_LIST_HEAD (&granted_list);

         pthread_mutex_lock (&pl_inode->mutex);
         {
                 __grant_blocked_locks (this, pl_inode, &granted_list);
+
+               if (__pl_inode_is_empty (pl_inode) && pl_inode->refkeeper) {
+                       unref = pl_inode->refkeeper;
+                       pl_inode->refkeeper = NULL;
+               }
         }
         pthread_mutex_unlock (&pl_inode->mutex);

@@ -965,6 +977,9 @@ grant_blocked_locks (xlator_t *this, pl_inode_t
*pl_inode)
                 GF_FREE (lock);
         }

+       if (unref)
+               inode_unref (unref);
+
         return;
 }


This should make pl_disconnect_cbk() pretty much race free w.r.t refkpeer.
Thoughts?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140605/2e9e043f/attachment.html>