[Gluster-devel] Regarding doing away with refkeeper in locks xlator

Thu Jun 5 07:08:41 UTC 2014

On 6/4/14, 9:43 PM, Krutika Dhananjay wrote:
>
>
> ------------------------------------------------------------------------
>
>     *From: *"Pranith Kumar Karampuri" <pkarampu at redhat.com>
>     *To: *"Krutika Dhananjay" <kdhananj at redhat.com>, "Anand Avati"
>     <aavati at redhat.com>
>     *Cc: *gluster-devel at gluster.org
>     *Sent: *Wednesday, June 4, 2014 12:23:59 PM
>     *Subject: *Re: [Gluster-devel] Regarding doing away with refkeeper
>     in locks xlator
>
>
>     On 06/04/2014 12:02 PM, Pranith Kumar Karampuri wrote:
>
>
>         On 06/04/2014 11:37 AM, Krutika Dhananjay wrote:
>
>             Hi,
>
>             Recently there was a crash in locks translator (BZ 1103347,
>             BZ 1097102) with the following backtrace:
>             (gdb) bt
>             #0  uuid_unpack (in=0x8 <Address 0x8 out of bounds>,
>             uu=0x7fffea6c6a60) at ../../contrib/uuid/unpack.c:44
>             #1  0x00007feeba9e19d6 in uuid_unparse_x (uu=<value
>             optimized out>, out=0x2350fc0
>             "081bbc7a-7551-44ac-85c7-aad5e2633db9",
>                  fmt=0x7feebaa08e00
>             "%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x") at
>             ../../contrib/uuid/unparse.c:55
>             #2  0x00007feeba9be837 in uuid_utoa (uuid=0x8 <Address 0x8
>             out of bounds>) at common-utils.c:2138
>             #3  0x00007feeb06e8a58 in pl_inodelk_log_cleanup
>             (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:396
>             #4  pl_inodelk_client_cleanup (this=0x230d910,
>             ctx=0x7fee700f0c60) at inodelk.c:428
>             #5  0x00007feeb06ddf3a in pl_client_disconnect_cbk
>             (this=0x230d910, client=<value optimized out>) at posix.c:2550
>             #6  0x00007feeba9fa2dd in gf_client_disconnect
>             (client=0x27724a0) at client_t.c:368
>             #7  0x00007feeab77ed48 in server_connection_cleanup
>             (this=0x2316390, client=0x27724a0, flags=<value optimized
>             out>) at server-helpers.c:354
>             #8  0x00007feeab77ae2c in server_rpc_notify (rpc=<value
>             optimized out>, xl=0x2316390, event=<value optimized out>,
>             data=0x2bf51c0) at server.c:527
>             #9  0x00007feeba775155 in rpcsvc_handle_disconnect
>             (svc=0x2325980, trans=0x2bf51c0) at rpcsvc.c:720
>             #10 0x00007feeba776c30 in rpcsvc_notify (trans=0x2bf51c0,
>             mydata=<value optimized out>, event=<value optimized out>,
>             data=0x2bf51c0) at rpcsvc.c:758
>             #11 0x00007feeba778638 in rpc_transport_notify (this=<value
>             optimized out>, event=<value optimized out>, data=<value
>             optimized out>) at rpc-transport.c:512
>             #12 0x00007feeb115e971 in socket_event_poll_err (fd=<value
>             optimized out>, idx=<value optimized out>, data=0x2bf51c0,
>             poll_in=<value optimized out>, poll_out=0,
>                  poll_err=0) at socket.c:1071
>             #13 socket_event_handler (fd=<value optimized out>,
>             idx=<value optimized out>, data=0x2bf51c0, poll_in=<value
>             optimized out>, poll_out=0, poll_err=0) at socket.c:2240
>             #14 0x00007feeba9fc6a7 in event_dispatch_epoll_handler
>             (event_pool=0x22e2d00) at event-epoll.c:384
>             #15 event_dispatch_epoll (event_pool=0x22e2d00) at
>             event-epoll.c:445
>             #16 0x0000000000407e93 in main (argc=19,
>             argv=0x7fffea6c7f88) at glusterfsd.c:2023
>             (gdb) f 4
>             #4  pl_inodelk_client_cleanup (this=0x230d910,
>             ctx=0x7fee700f0c60) at inodelk.c:428
>             428                pl_inodelk_log_cleanup (l);
>             (gdb) p l->pl_inode->refkeeper
>             $1 = (inode_t *) 0x0
>             (gdb)
>
>             pl_inode->refkeeper was found to be NULL even when there
>             were some blocked inodelks in a certain domain of the inode,
>             which when dereferenced by the epoll thread in the cleanup
>             codepath led to a crash.
>
>             On inspecting the code (for want of a consistent
>             reproducer), three things were found:
>
>             1. The function where the crash happens
>             (pl_inodelk_log_cleanup()), makes an attempt to resolve the
>             inode to path as can be seen below. But the way inode_path()
>             itself
>                  works is to first construct the path based on the given
>             inode's ancestry and place it in the buffer provided. And if
>             all else fails, the gfid of the inode is placed in a certain
>             format ("<gfid:%s>").
>                  This eliminates the need for statements from line 4
>             through 7 below, thereby "preventing" dereferencing of
>             pl_inode->refkeeper.
>                  Now, although this change prevents the crash
>             altogether, it still does not fix the race that led to
>             pl_inode->refkeeper becoming NULL, and comes at the cost of
>                  printing "(null)" in the log message on line 9 every
>             time pl_inode->refkeeper is found to be NULL, rendering the
>             logged messages somewhat useless.
>
>             <code>
>                0         pl_inode = lock->pl_inode;
>                1
>                2         inode_path (pl_inode->refkeeper, NULL, &path);
>                3
>                4         if (path)
>                5                 file = path;
>                6         else
>                7                 file = uuid_utoa
>             (pl_inode->refkeeper->gfid);
>             8
>                9         gf_log (THIS->name, GF_LOG_WARNING,
>               10                 "releasing lock on %s held by "
>               11                 "{client=%p, pid=%"PRId64" lk-owner=%s}",
>               12                 file, lock->client, (uint64_t)
>             lock->client_pid,
>               13                 lkowner_utoa (&lock->owner));
>             <\code>
>
>         I think this logging code is from the days when gfid handle
>         concept was not there. So it wasn't returning <gfid:gfid-str> in
>         cases the path is not present in the dentries. I believe the
>         else block can be deleted safely now.
>
>         Pranith
>
> Posted patch http://review.gluster.org/#/c/7981/1 for review, to fix
> problem #1 above.
>
> -Krutika
>
>
>             2. There is at least one codepath found that can lead to
>             this crash:
>                  Imagine an inode on which an inodelk operation is
>             attempted by a client and is successfully granted too.
>                 Now, between the time the lock was granted and
>             pl_update_refkeeper() was called by this thread, the client
>             could send a DISCONNECT event,
>                 causing cleanup codepath to be executed, where the epoll
>             thread crashes on dereferencing pl_inode->refkeeper which is
>             STILL NULL at this point.
>
>                 Besides, there are still places in locks xlator where
>             the refkeeper is NOT updated whenever the lists are modified
>             - for instance in the cleanup codepath from a DISCONNECT.
>
>             3. Also, pl_update_refkeeper() seems to be not taking into
>             account blocked locks on the inode in the
>             __pl_inode_is_empty() check, when it should, as there could
>             still be cases
>                  where the granted list could be empty but not the
>             blocked list at the time of udpating the refkeeper, in which
>             case pl_inode must still take ref on the inode.
>
>         Avati,
>               There is one more issue, where pl_inode_forget is cleaning
>         up blocked/granted locks which is not handling the new
>         'client_list'. But if we fix inode ref/unref like Kritika
>         proposed, pl_forget should never come when there is a
>         granted/blocked lock. So may be we should delete that as well.
>         Please feel free to comment.
>
>     It is pl_forget not pl_inode_forget. typo :-(.
>
>     Pranith
>
>
>         Pranith
>
>
>             Proposed solution to 2/3:
>
>             1. Do away with refkeeper in pl_inode_t altogether.
>             2. Let every lock object (inodelk/entryllk/posix_lock) have
>             an inode_t * member to act as a placeholder for the
>             associated inode object on which it is locking.
>             3. Let each lock object hold a ref on the inode at the time
>             of its creation and unref the inode during its destruction.
>
>               Please let me know what you think of the above.
>
>             -Krutika
>
>
>
>
>             _______________________________________________
>             Gluster-devel mailing list
>             Gluster-devel at gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
>         _______________________________________________
>         Gluster-devel mailing list
>         Gluster-devel at gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
>
>

To summarize, the real "problems" are:

- Deref of pl_inode->refkeeper as an inode_t in the cleanup logger. It 
is an internal artifact of pl_update_refkeeper() working and nobody else 
is supposed to "touch" it.

- Missing call to pl_update_refkeeper() in client_disconnect_cbk(). This 
can result in a potential leak of inode ref (not a leak if the same 
inode gets any locking activity by another client in the future).

Nice to have (not necessary):

- Include blocked locks in pl_inode_is_empty(), for the sake of 
symmetry. I don't see how this will make a difference thoough, given the 
goal of pl_update_refkeeper (which is, ensure inode->ref > 0 while a 
user's locks are existing).

Avati