[Gluster-devel] Regarding doing away with refkeeper in locks xlator

Pranith Kumar Karampuri pkarampu at redhat.com
Wed Jun 4 06:53:59 UTC 2014


On 06/04/2014 12:02 PM, Pranith Kumar Karampuri wrote:
>
> On 06/04/2014 11:37 AM, Krutika Dhananjay wrote:
>> Hi,
>>
>> Recently there was a crash in locks translator (BZ 1103347, BZ 
>> 1097102) with the following backtrace:
>> (gdb) bt
>> #0  uuid_unpack (in=0x8 <Address 0x8 out of bounds>, 
>> uu=0x7fffea6c6a60) at ../../contrib/uuid/unpack.c:44
>> #1  0x00007feeba9e19d6 in uuid_unparse_x (uu=<value optimized out>, 
>> out=0x2350fc0 "081bbc7a-7551-44ac-85c7-aad5e2633db9",
>>     fmt=0x7feebaa08e00 
>> "%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x") at 
>> ../../contrib/uuid/unparse.c:55
>> #2  0x00007feeba9be837 in uuid_utoa (uuid=0x8 <Address 0x8 out of 
>> bounds>) at common-utils.c:2138
>> #3  0x00007feeb06e8a58 in pl_inodelk_log_cleanup (this=0x230d910, 
>> ctx=0x7fee700f0c60) at inodelk.c:396
>> #4  pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at 
>> inodelk.c:428
>> #5  0x00007feeb06ddf3a in pl_client_disconnect_cbk (this=0x230d910, 
>> client=<value optimized out>) at posix.c:2550
>> #6  0x00007feeba9fa2dd in gf_client_disconnect (client=0x27724a0) at 
>> client_t.c:368
>> #7  0x00007feeab77ed48 in server_connection_cleanup (this=0x2316390, 
>> client=0x27724a0, flags=<value optimized out>) at server-helpers.c:354
>> #8  0x00007feeab77ae2c in server_rpc_notify (rpc=<value optimized 
>> out>, xl=0x2316390, event=<value optimized out>, data=0x2bf51c0) at 
>> server.c:527
>> #9  0x00007feeba775155 in rpcsvc_handle_disconnect (svc=0x2325980, 
>> trans=0x2bf51c0) at rpcsvc.c:720
>> #10 0x00007feeba776c30 in rpcsvc_notify (trans=0x2bf51c0, 
>> mydata=<value optimized out>, event=<value optimized out>, 
>> data=0x2bf51c0) at rpcsvc.c:758
>> #11 0x00007feeba778638 in rpc_transport_notify (this=<value optimized 
>> out>, event=<value optimized out>, data=<value optimized out>) at 
>> rpc-transport.c:512
>> #12 0x00007feeb115e971 in socket_event_poll_err (fd=<value optimized 
>> out>, idx=<value optimized out>, data=0x2bf51c0, poll_in=<value 
>> optimized out>, poll_out=0,
>>     poll_err=0) at socket.c:1071
>> #13 socket_event_handler (fd=<value optimized out>, idx=<value 
>> optimized out>, data=0x2bf51c0, poll_in=<value optimized out>, 
>> poll_out=0, poll_err=0) at socket.c:2240
>> #14 0x00007feeba9fc6a7 in event_dispatch_epoll_handler 
>> (event_pool=0x22e2d00) at event-epoll.c:384
>> #15 event_dispatch_epoll (event_pool=0x22e2d00) at event-epoll.c:445
>> #16 0x0000000000407e93 in main (argc=19, argv=0x7fffea6c7f88) at 
>> glusterfsd.c:2023
>> (gdb) f 4
>> #4  pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at 
>> inodelk.c:428
>> 428                pl_inodelk_log_cleanup (l);
>> (gdb) p l->pl_inode->refkeeper
>> $1 = (inode_t *) 0x0
>> (gdb)
>>
>> pl_inode->refkeeper was found to be NULL even when there were some 
>> blocked inodelks in a certain domain of the inode,
>> which when dereferenced by the epoll thread in the cleanup codepath 
>> led to a crash.
>>
>> On inspecting the code (for want of a consistent reproducer), three 
>> things were found:
>>
>> 1. The function where the crash happens (pl_inodelk_log_cleanup()), 
>> makes an attempt to resolve the inode to path as can be seen below. 
>> But the way inode_path() itself
>>     works is to first construct the path based on the given inode's 
>> ancestry and place it in the buffer provided. And if all else fails, 
>> the gfid of the inode is placed in a certain format ("<gfid:%s>").
>>     This eliminates the need for statements from line 4 through 7 
>> below, thereby "preventing" dereferencing of pl_inode->refkeeper.
>>     Now, although this change prevents the crash altogether, it still 
>> does not fix the race that led to pl_inode->refkeeper becoming NULL, 
>> and comes at the cost of
>>     printing "(null)" in the log message on line 9 every time 
>> pl_inode->refkeeper is found to be NULL, rendering the logged 
>> messages somewhat useless.
>>
>> <code>
>>   0         pl_inode = lock->pl_inode;
>>   1
>>   2         inode_path (pl_inode->refkeeper, NULL, &path);
>>   3
>>   4         if (path)
>>   5                 file = path;
>>   6         else
>>   7                 file = uuid_utoa (pl_inode->refkeeper->gfid);
>> 8
>>   9         gf_log (THIS->name, GF_LOG_WARNING,
>>  10                 "releasing lock on %s held by "
>>  11                 "{client=%p, pid=%"PRId64" lk-owner=%s}",
>>  12                 file, lock->client, (uint64_t) lock->client_pid,
>>  13                 lkowner_utoa (&lock->owner));
>> <\code>
> I think this logging code is from the days when gfid handle concept 
> was not there. So it wasn't returning <gfid:gfid-str> in cases the 
> path is not present in the dentries. I believe the else block can be 
> deleted safely now.
>
> Pranith
>>
>> 2. There is at least one codepath found that can lead to this crash:
>>     Imagine an inode on which an inodelk operation is attempted by a 
>> client and is successfully granted too.
>>    Now, between the time the lock was granted and 
>> pl_update_refkeeper() was called by this thread, the client could 
>> send a DISCONNECT event,
>>    causing cleanup codepath to be executed, where the epoll thread 
>> crashes on dereferencing pl_inode->refkeeper which is STILL NULL at 
>> this point.
>>
>>    Besides, there are still places in locks xlator where the 
>> refkeeper is NOT updated whenever the lists are modified - for 
>> instance in the cleanup codepath from a DISCONNECT.
>>
>> 3. Also, pl_update_refkeeper() seems to be not taking into account 
>> blocked locks on the inode in the __pl_inode_is_empty() check, when 
>> it should, as there could still be cases
>>     where the granted list could be empty but not the blocked list at 
>> the time of udpating the refkeeper, in which case pl_inode must still 
>> take ref on the inode.
> Avati,
>      There is one more issue, where pl_inode_forget is cleaning up 
> blocked/granted locks which is not handling the new 'client_list'. But 
> if we fix inode ref/unref like Kritika proposed, pl_forget should 
> never come when there is a granted/blocked lock. So may be we should 
> delete that as well. Please feel free to comment.
It is pl_forget not pl_inode_forget. typo :-(.

Pranith
>
> Pranith
>>
>> Proposed solution to 2/3:
>>
>> 1. Do away with refkeeper in pl_inode_t altogether.
>> 2. Let every lock object (inodelk/entryllk/posix_lock) have an 
>> inode_t * member to act as a placeholder for the associated inode 
>> object on which it is locking.
>> 3. Let each lock object hold a ref on the inode at the time of its 
>> creation and unref the inode during its destruction.
>>
>>  Please let me know what you think of the above.
>>
>> -Krutika
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140604/8f06242b/attachment.html>


More information about the Gluster-devel mailing list