[Gluster-devel] Regarding doing away with refkeeper in locks xlator

Fri Jun 6 02:52:28 UTC 2014

On 06/06/2014 12:11 AM, Anand Avati wrote:
>
> On Thu, Jun 5, 2014 at 10:46 AM, Krutika Dhananjay 
> <kdhananj at redhat.com <mailto:kdhananj at redhat.com>> wrote:
>
>         To summarize, the real "problems" are:
>
>         - Deref of pl_inode->refkeeper as an inode_t in the cleanup
>         logger. It
>         is an internal artifact of pl_update_refkeeper() working and
>         nobody else
>         is supposed to "touch" it.
>
>     For this, the solution I have in mind is to
>     a. have a placeholder for gfid in pl_inode_t object,
>     b. remember the gfid of the inode at the time of pl_inode_t
>     creation in pl_ctx_get(), and
>     c. print pl_inode->gfid in the log message in
>     pl_{inodelk,entrylk}_log_cleanup().
>
>
> OK.
>
>         - Missing call to pl_update_refkeeper() in
>         client_disconnect_cbk(). This
>         can result in a potential leak of inode ref (not a leak if the
>         same
>         inode gets any locking activity by another client in the future).
>
>     As far as updates to refkeeper is concerned during DISCONNECT is
>     concerned, Pranith and I did have discussions at length but could
>     not find a single safe place in cleanup functions to call
>     pl_update_refkeeper() that does not lead to illegal memory access,
>     whether before or after the call to __pl_inodelk_unref() in the
>     third for loop.
>
>     Case #1 explained:
>     =============
>
>     <snip>
>     list_for_each_entry_safe() {
>     ...
>     ...
>     pl_update_refkeeper(inode);
>     pthread_mutex_lock (&pl_inode->mutex);
>          __pl_inodelk_unref(l);
>     pthread_mutex_unlock (&pl_inode->mutex);
>     </snip>
>
>     This causes the last unref in pl_update_refkeeper() to implicitly
>     call pl_forget() where pl_inode_t object is destroyed while it is
>     still needed in terms of having to obtain pl_inode->mutex before
>     unrefing the lock object.
>
>     Case 2 explained:
>     =============
>     <snip>
>     list_for_each_entry_safe() {
>     ...
>     ...
>     pthread_mutex_lock (&pl_inode->mutex);
>          __pl_inodelk_unref(l);
>     pthread_mutex_unlock (&pl_inode->mutex);
>     pl_update_refkeeper(inode);
>     </snip>
>
>     After the first for loop is processed, the domain's lists could
>     have been emptied. And at this point, there could well come a
>     second thread that could update refkeeper,
>     triggering last unref on the inode leading to a call to
>     pl_forget() (where pl_inode_t is freed), after which the
>     DISCONNECT thread would be trying to perform illegal
>     memory access in its call to pl_update_refkeeper() during its turn.
>
>     So the solution Pranith and I came up with involves making sure
>     that the inode_t object is alive for as long as there is atleast
>     one lock on it:
>
>     1. refkeeper will not be used for inodelks and entrylks (but will
>     continue to be used in fcntl locks).
>     2. Each pl_inode_t object will also host an inode_t member which
>     is initialised during a call to pl_inode_get() in
>     pl_common_{inodelk, entrylks}().
>     3. Everytime a lock is granted/blocked, pl_inode->inode is ref'd
>     (in pl_inode_setlk() after successful locking.
>     4. Everytime a lock is unlocked, pl_inode->inode is unref'd.
>     5. In disconnect codepath, as part of "released" list processing,
>     the inodes associated with the client are unref'd in the same loop
>     right after every lock object is unref'd.
>
>     With all this, there is one problem with the lock object that is
>     found to be in both blocked and granted list in the transient
>     state, when there's a race between the unlocking thread
>     and the disconnecting thread. This can be best explained with the
>     following example:
>     <example>
>     Consider an inode I on which there's a granted lock G and a
>     blocked lock B, from clients C1 and C2 respectively. With this
>     approach, at this point the number of refs taken
>     by the locks xlator on I would be 2.
>     C1 unlocks B, at which point I's refcount becomes 1.
>     Now as part of granting other blocked locks in unlock codepath, B
>     is put in granted list.
>     Now B's client C2 sends a DISCONNECT event, which would cause I to
>     be unref'd again. This being the last unref, pl_forget() is called
>     on I causing its pl_inode to be destroyed
>     At this point, the unlocking thread could try to do a mutex lock
>     on pl_inode->mutex in grant_blocked_{inode,entry}_locks() (whereas
>     pl_inode is now already destroyed) leading to a crash.
>     </example>
>
>     The problem described in the example can be fixed by making sure
>     grant_blocked_{inode,entry}_locks() refs the inode first thing and
>     then unrefs it just before returning.
>     This would fix illegal memory access.
>
>
> This sounds a bit complicated. I think there is a much simpler solution:
>
> - First, make update_refkeeper() check for blocked locks (which I 
> mentioned as "optional" previously)
>
> - Make grant_blocked_locks() double up and do the job of 
> update_refkeeper() internally.
>
> Something which looks like this:
>
> diff --git a/xlators/features/locks/src/common.c 
> b/xlators/features/locks/src/common.c
> index f6c71c1..38df385 100644
> --- a/xlators/features/locks/src/common.c
> +++ b/xlators/features/locks/src/common.c
> @@ -126,8 +126,14 @@ __pl_inode_is_empty (pl_inode_t *pl_inode)
>                  if (!list_empty (&dom->entrylk_list))
>                          is_empty = 0;
> +                if (!list_empty (&dom->blocked_entrylks))
> +                        is_empty = 0;
> +
>                  if (!list_empty (&dom->inodelk_list))
>                          is_empty = 0;
> +
> +                if (!list_empty (&dom->blocked_inodelks))
> +                        is_empty = 0;
>          }
>          return is_empty;
> @@ -944,12 +950,18 @@ grant_blocked_locks (xlator_t *this, pl_inode_t 
> *pl_inode)
>          struct list_head granted_list;
>          posix_lock_t     *tmp = NULL;
>          posix_lock_t     *lock = NULL;
> +       inode_t *unref = NULL;
>          INIT_LIST_HEAD (&granted_list);
>          pthread_mutex_lock (&pl_inode->mutex);
>          {
>                  __grant_blocked_locks (this, pl_inode, &granted_list);
> +
> +               if (__pl_inode_is_empty (pl_inode) && 
> pl_inode->refkeeper) {
> +                       unref = pl_inode->refkeeper;
> +                       pl_inode->refkeeper = NULL;
> +               }
>          }
>          pthread_mutex_unlock (&pl_inode->mutex);
> @@ -965,6 +977,9 @@ grant_blocked_locks (xlator_t *this, pl_inode_t 
> *pl_inode)
>                  GF_FREE (lock);
>          }
> +       if (unref)
> +               inode_unref (unref);
> +
>          return;
>  }
>
>
> This should make pl_disconnect_cbk() pretty much race free w.r.t 
> refkpeer. Thoughts?
Lets say C1 is doing pl_inodelk_client_cleanup. After the second 
for-loop(All granted and blocked locks are out of the domain) if an 
unlock on the final granted lock on that inode from client C2 completes, 
refkeeper would be set to NULL and unrefed leading to zero refs on that 
inode i.e. pl_forget will also happen. In 3rd for-loop pl_inode is 
already freed and leads to free'd memory access and will crash.

Pranith
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140606/d1a8df0f/attachment-0001.html>