[Gluster-devel] Regarding doing away with refkeeper in locks xlator

Krutika Dhananjay kdhananj at redhat.com
Thu Jun 5 04:43:02 UTC 2014


----- Original Message -----

> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> To: "Krutika Dhananjay" <kdhananj at redhat.com>, "Anand Avati"
> <aavati at redhat.com>
> Cc: gluster-devel at gluster.org
> Sent: Wednesday, June 4, 2014 12:23:59 PM
> Subject: Re: [Gluster-devel] Regarding doing away with refkeeper in locks
> xlator

> On 06/04/2014 12:02 PM, Pranith Kumar Karampuri wrote:

> > On 06/04/2014 11:37 AM, Krutika Dhananjay wrote:
> 

> > > Hi,
> > 
> 

> > > Recently there was a crash in locks translator (BZ 1103347, BZ 1097102)
> > > with
> > > the following backtrace:
> > 
> 
> > > (gdb) bt
> > 
> 
> > > #0 uuid_unpack (in=0x8 <Address 0x8 out of bounds>, uu=0x7fffea6c6a60) at
> > > ../../contrib/uuid/unpack.c:44
> > 
> 
> > > #1 0x00007feeba9e19d6 in uuid_unparse_x (uu=<value optimized out>,
> > > out=0x2350fc0 "081bbc7a-7551-44ac-85c7-aad5e2633db9",
> > 
> 
> > > fmt=0x7feebaa08e00 "%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x") at
> > > ../../contrib/uuid/unparse.c:55
> > 
> 
> > > #2 0x00007feeba9be837 in uuid_utoa (uuid=0x8 <Address 0x8 out of bounds>)
> > > at
> > > common-utils.c:2138
> > 
> 
> > > #3 0x00007feeb06e8a58 in pl_inodelk_log_cleanup (this=0x230d910,
> > > ctx=0x7fee700f0c60) at inodelk.c:396
> > 
> 
> > > #4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at
> > > inodelk.c:428
> > 
> 
> > > #5 0x00007feeb06ddf3a in pl_client_disconnect_cbk (this=0x230d910,
> > > client=<value optimized out>) at posix.c:2550
> > 
> 
> > > #6 0x00007feeba9fa2dd in gf_client_disconnect (client=0x27724a0) at
> > > client_t.c:368
> > 
> 
> > > #7 0x00007feeab77ed48 in server_connection_cleanup (this=0x2316390,
> > > client=0x27724a0, flags=<value optimized out>) at server-helpers.c:354
> > 
> 
> > > #8 0x00007feeab77ae2c in server_rpc_notify (rpc=<value optimized out>,
> > > xl=0x2316390, event=<value optimized out>, data=0x2bf51c0) at
> > > server.c:527
> > 
> 
> > > #9 0x00007feeba775155 in rpcsvc_handle_disconnect (svc=0x2325980,
> > > trans=0x2bf51c0) at rpcsvc.c:720
> > 
> 
> > > #10 0x00007feeba776c30 in rpcsvc_notify (trans=0x2bf51c0, mydata=<value
> > > optimized out>, event=<value optimized out>, data=0x2bf51c0) at
> > > rpcsvc.c:758
> > 
> 
> > > #11 0x00007feeba778638 in rpc_transport_notify (this=<value optimized
> > > out>,
> > > event=<value optimized out>, data=<value optimized out>) at
> > > rpc-transport.c:512
> > 
> 
> > > #12 0x00007feeb115e971 in socket_event_poll_err (fd=<value optimized
> > > out>,
> > > idx=<value optimized out>, data=0x2bf51c0, poll_in=<value optimized out>,
> > > poll_out=0,
> > 
> 
> > > poll_err=0) at socket.c:1071
> > 
> 
> > > #13 socket_event_handler (fd=<value optimized out>, idx=<value optimized
> > > out>, data=0x2bf51c0, poll_in=<value optimized out>, poll_out=0,
> > > poll_err=0)
> > > at socket.c:2240
> > 
> 
> > > #14 0x00007feeba9fc6a7 in event_dispatch_epoll_handler
> > > (event_pool=0x22e2d00)
> > > at event-epoll.c:384
> > 
> 
> > > #15 event_dispatch_epoll (event_pool=0x22e2d00) at event-epoll.c:445
> > 
> 
> > > #16 0x0000000000407e93 in main (argc=19, argv=0x7fffea6c7f88) at
> > > glusterfsd.c:2023
> > 
> 
> > > (gdb) f 4
> > 
> 
> > > #4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at
> > > inodelk.c:428
> > 
> 
> > > 428 pl_inodelk_log_cleanup (l);
> > 
> 
> > > (gdb) p l->pl_inode->refkeeper
> > 
> 
> > > $1 = (inode_t *) 0x0
> > 
> 
> > > (gdb)
> > 
> 

> > > pl_inode->refkeeper was found to be NULL even when there were some
> > > blocked
> > > inodelks in a certain domain of the inode,
> > 
> 
> > > which when dereferenced by the epoll thread in the cleanup codepath led
> > > to
> > > a
> > > crash.
> > 
> 

> > > On inspecting the code (for want of a consistent reproducer), three
> > > things
> > > were found:
> > 
> 

> > > 1. The function where the crash happens (pl_inodelk_log_cleanup()), makes
> > > an
> > > attempt to resolve the inode to path as can be seen below. But the way
> > > inode_path() itself
> > 
> 
> > > works is to first construct the path based on the given inode's ancestry
> > > and
> > > place it in the buffer provided. And if all else fails, the gfid of the
> > > inode is placed in a certain format ("<gfid:%s>").
> > 
> 
> > > This eliminates the need for statements from line 4 through 7 below,
> > > thereby
> > > "preventing" dereferencing of pl_inode->refkeeper.
> > 
> 
> > > Now, although this change prevents the crash altogether, it still does
> > > not
> > > fix the race that led to pl_inode->refkeeper becoming NULL, and comes at
> > > the
> > > cost of
> > 
> 
> > > printing "(null)" in the log message on line 9 every time
> > > pl_inode->refkeeper
> > > is found to be NULL, rendering the logged messages somewhat useless.
> > 
> 

> > > <code>
> > 
> 
> > > 0 pl_inode = lock->pl_inode;
> > 
> 
> > > 1
> > 
> 
> > > 2 inode_path (pl_inode->refkeeper, NULL, &path);
> > 
> 
> > > 3
> > 
> 
> > > 4 if (path)
> > 
> 
> > > 5 file = path;
> > 
> 
> > > 6 else
> > 
> 
> > > 7 file = uuid_utoa (pl_inode->refkeeper->gfid);
> > 
> 
> > > 8
> > 
> 
> > > 9 gf_log (THIS->name, GF_LOG_WARNING,
> > 
> 
> > > 10 "releasing lock on %s held by "
> > 
> 
> > > 11 "{client=%p, pid=%"PRId64" lk-owner=%s}",
> > 
> 
> > > 12 file, lock->client, (uint64_t) lock->client_pid,
> > 
> 
> > > 13 lkowner_utoa (&lock->owner));
> > 
> 
> > > <\code>
> > 
> 

> > I think this logging code is from the days when gfid handle concept was not
> > there. So it wasn't returning <gfid:gfid-str> in cases the path is not
> > present in the dentries. I believe the else block can be deleted safely
> > now.
> 

> > Pranith
> 
Posted patch http://review.gluster.org/#/c/7981/1 for review, to fix problem #1 above. 

-Krutika 

> > > 2. There is at least one codepath found that can lead to this crash:
> > 
> 
> > > Imagine an inode on which an inodelk operation is attempted by a client
> > > and
> > > is successfully granted too.
> > 
> 
> > > Now, between the time the lock was granted and pl_update_refkeeper() was
> > > called by this thread, the client could send a DISCONNECT event,
> > 
> 
> > > causing cleanup codepath to be executed, where the epoll thread crashes
> > > on
> > > dereferencing pl_inode->refkeeper which is STILL NULL at this point.
> > 
> 

> > > Besides, there are still places in locks xlator where the refkeeper is
> > > NOT
> > > updated whenever the lists are modified - for instance in the cleanup
> > > codepath from a DISCONNECT.
> > 
> 

> > > 3. Also, pl_update_refkeeper() seems to be not taking into account
> > > blocked
> > > locks on the inode in the __pl_inode_is_empty() check, when it should, as
> > > there could still be cases
> > 
> 
> > > where the granted list could be empty but not the blocked list at the
> > > time
> > > of
> > > udpating the refkeeper, in which case pl_inode must still take ref on the
> > > inode.
> > 
> 

> > Avati,
> 
> > There is one more issue, where pl_inode_forget is cleaning up
> > blocked/granted
> > locks which is not handling the new 'client_list'. But if we fix inode
> > ref/unref like Kritika proposed, pl_forget should never come when there is
> > a
> > granted/blocked lock. So may be we should delete that as well. Please feel
> > free to comment.
> 

> It is pl_forget not pl_inode_forget. typo :-(.

> Pranith

> > Pranith
> 

> > > Proposed solution to 2/3:
> > 
> 

> > > 1. Do away with refkeeper in pl_inode_t altogether.
> > 
> 
> > > 2. Let every lock object (inodelk/entryllk/posix_lock) have an inode_t *
> > > member to act as a placeholder for the associated inode object on which
> > > it
> > > is locking.
> > 
> 
> > > 3. Let each lock object hold a ref on the inode at the time of its
> > > creation
> > > and unref the inode during its destruction.
> > 
> 

> > > Please let me know what you think of the above.
> > 
> 

> > > -Krutika
> > 
> 

> > > _______________________________________________
> > 
> 
> > > Gluster-devel mailing list Gluster-devel at gluster.org
> > > http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> > 
> 

> > _______________________________________________
> 
> > Gluster-devel mailing list Gluster-devel at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140605/486d8277/attachment-0001.html>


More information about the Gluster-devel mailing list