[Gluster-devel] Regarding doing away with refkeeper in locks xlator
Anand Avati
aavati at redhat.com
Thu Jun 5 07:08:41 UTC 2014
On 6/4/14, 9:43 PM, Krutika Dhananjay wrote:
>
>
> ------------------------------------------------------------------------
>
> *From: *"Pranith Kumar Karampuri" <pkarampu at redhat.com>
> *To: *"Krutika Dhananjay" <kdhananj at redhat.com>, "Anand Avati"
> <aavati at redhat.com>
> *Cc: *gluster-devel at gluster.org
> *Sent: *Wednesday, June 4, 2014 12:23:59 PM
> *Subject: *Re: [Gluster-devel] Regarding doing away with refkeeper
> in locks xlator
>
>
> On 06/04/2014 12:02 PM, Pranith Kumar Karampuri wrote:
>
>
> On 06/04/2014 11:37 AM, Krutika Dhananjay wrote:
>
> Hi,
>
> Recently there was a crash in locks translator (BZ 1103347,
> BZ 1097102) with the following backtrace:
> (gdb) bt
> #0 uuid_unpack (in=0x8 <Address 0x8 out of bounds>,
> uu=0x7fffea6c6a60) at ../../contrib/uuid/unpack.c:44
> #1 0x00007feeba9e19d6 in uuid_unparse_x (uu=<value
> optimized out>, out=0x2350fc0
> "081bbc7a-7551-44ac-85c7-aad5e2633db9",
> fmt=0x7feebaa08e00
> "%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x") at
> ../../contrib/uuid/unparse.c:55
> #2 0x00007feeba9be837 in uuid_utoa (uuid=0x8 <Address 0x8
> out of bounds>) at common-utils.c:2138
> #3 0x00007feeb06e8a58 in pl_inodelk_log_cleanup
> (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:396
> #4 pl_inodelk_client_cleanup (this=0x230d910,
> ctx=0x7fee700f0c60) at inodelk.c:428
> #5 0x00007feeb06ddf3a in pl_client_disconnect_cbk
> (this=0x230d910, client=<value optimized out>) at posix.c:2550
> #6 0x00007feeba9fa2dd in gf_client_disconnect
> (client=0x27724a0) at client_t.c:368
> #7 0x00007feeab77ed48 in server_connection_cleanup
> (this=0x2316390, client=0x27724a0, flags=<value optimized
> out>) at server-helpers.c:354
> #8 0x00007feeab77ae2c in server_rpc_notify (rpc=<value
> optimized out>, xl=0x2316390, event=<value optimized out>,
> data=0x2bf51c0) at server.c:527
> #9 0x00007feeba775155 in rpcsvc_handle_disconnect
> (svc=0x2325980, trans=0x2bf51c0) at rpcsvc.c:720
> #10 0x00007feeba776c30 in rpcsvc_notify (trans=0x2bf51c0,
> mydata=<value optimized out>, event=<value optimized out>,
> data=0x2bf51c0) at rpcsvc.c:758
> #11 0x00007feeba778638 in rpc_transport_notify (this=<value
> optimized out>, event=<value optimized out>, data=<value
> optimized out>) at rpc-transport.c:512
> #12 0x00007feeb115e971 in socket_event_poll_err (fd=<value
> optimized out>, idx=<value optimized out>, data=0x2bf51c0,
> poll_in=<value optimized out>, poll_out=0,
> poll_err=0) at socket.c:1071
> #13 socket_event_handler (fd=<value optimized out>,
> idx=<value optimized out>, data=0x2bf51c0, poll_in=<value
> optimized out>, poll_out=0, poll_err=0) at socket.c:2240
> #14 0x00007feeba9fc6a7 in event_dispatch_epoll_handler
> (event_pool=0x22e2d00) at event-epoll.c:384
> #15 event_dispatch_epoll (event_pool=0x22e2d00) at
> event-epoll.c:445
> #16 0x0000000000407e93 in main (argc=19,
> argv=0x7fffea6c7f88) at glusterfsd.c:2023
> (gdb) f 4
> #4 pl_inodelk_client_cleanup (this=0x230d910,
> ctx=0x7fee700f0c60) at inodelk.c:428
> 428 pl_inodelk_log_cleanup (l);
> (gdb) p l->pl_inode->refkeeper
> $1 = (inode_t *) 0x0
> (gdb)
>
> pl_inode->refkeeper was found to be NULL even when there
> were some blocked inodelks in a certain domain of the inode,
> which when dereferenced by the epoll thread in the cleanup
> codepath led to a crash.
>
> On inspecting the code (for want of a consistent
> reproducer), three things were found:
>
> 1. The function where the crash happens
> (pl_inodelk_log_cleanup()), makes an attempt to resolve the
> inode to path as can be seen below. But the way inode_path()
> itself
> works is to first construct the path based on the given
> inode's ancestry and place it in the buffer provided. And if
> all else fails, the gfid of the inode is placed in a certain
> format ("<gfid:%s>").
> This eliminates the need for statements from line 4
> through 7 below, thereby "preventing" dereferencing of
> pl_inode->refkeeper.
> Now, although this change prevents the crash
> altogether, it still does not fix the race that led to
> pl_inode->refkeeper becoming NULL, and comes at the cost of
> printing "(null)" in the log message on line 9 every
> time pl_inode->refkeeper is found to be NULL, rendering the
> logged messages somewhat useless.
>
> <code>
> 0 pl_inode = lock->pl_inode;
> 1
> 2 inode_path (pl_inode->refkeeper, NULL, &path);
> 3
> 4 if (path)
> 5 file = path;
> 6 else
> 7 file = uuid_utoa
> (pl_inode->refkeeper->gfid);
> 8
> 9 gf_log (THIS->name, GF_LOG_WARNING,
> 10 "releasing lock on %s held by "
> 11 "{client=%p, pid=%"PRId64" lk-owner=%s}",
> 12 file, lock->client, (uint64_t)
> lock->client_pid,
> 13 lkowner_utoa (&lock->owner));
> <\code>
>
> I think this logging code is from the days when gfid handle
> concept was not there. So it wasn't returning <gfid:gfid-str> in
> cases the path is not present in the dentries. I believe the
> else block can be deleted safely now.
>
> Pranith
>
> Posted patch http://review.gluster.org/#/c/7981/1 for review, to fix
> problem #1 above.
>
> -Krutika
>
>
> 2. There is at least one codepath found that can lead to
> this crash:
> Imagine an inode on which an inodelk operation is
> attempted by a client and is successfully granted too.
> Now, between the time the lock was granted and
> pl_update_refkeeper() was called by this thread, the client
> could send a DISCONNECT event,
> causing cleanup codepath to be executed, where the epoll
> thread crashes on dereferencing pl_inode->refkeeper which is
> STILL NULL at this point.
>
> Besides, there are still places in locks xlator where
> the refkeeper is NOT updated whenever the lists are modified
> - for instance in the cleanup codepath from a DISCONNECT.
>
> 3. Also, pl_update_refkeeper() seems to be not taking into
> account blocked locks on the inode in the
> __pl_inode_is_empty() check, when it should, as there could
> still be cases
> where the granted list could be empty but not the
> blocked list at the time of udpating the refkeeper, in which
> case pl_inode must still take ref on the inode.
>
> Avati,
> There is one more issue, where pl_inode_forget is cleaning
> up blocked/granted locks which is not handling the new
> 'client_list'. But if we fix inode ref/unref like Kritika
> proposed, pl_forget should never come when there is a
> granted/blocked lock. So may be we should delete that as well.
> Please feel free to comment.
>
> It is pl_forget not pl_inode_forget. typo :-(.
>
> Pranith
>
>
> Pranith
>
>
> Proposed solution to 2/3:
>
> 1. Do away with refkeeper in pl_inode_t altogether.
> 2. Let every lock object (inodelk/entryllk/posix_lock) have
> an inode_t * member to act as a placeholder for the
> associated inode object on which it is locking.
> 3. Let each lock object hold a ref on the inode at the time
> of its creation and unref the inode during its destruction.
>
> Please let me know what you think of the above.
>
> -Krutika
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
>
>
To summarize, the real "problems" are:
- Deref of pl_inode->refkeeper as an inode_t in the cleanup logger. It
is an internal artifact of pl_update_refkeeper() working and nobody else
is supposed to "touch" it.
- Missing call to pl_update_refkeeper() in client_disconnect_cbk(). This
can result in a potential leak of inode ref (not a leak if the same
inode gets any locking activity by another client in the future).
Nice to have (not necessary):
- Include blocked locks in pl_inode_is_empty(), for the sake of
symmetry. I don't see how this will make a difference thoough, given the
goal of pl_update_refkeeper (which is, ensure inode->ref > 0 while a
user's locks are existing).
Avati
More information about the Gluster-devel
mailing list