[Gluster-devel] Core by test case : georep-tarssh-hybrid.t
Susant Palai
spalai at redhat.com
Fri Apr 24 13:53:24 UTC 2015
Appended comments inline.
----- Original Message -----
> From: "Susant Palai" <spalai at redhat.com>
> To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> Cc: gluster-devel at gluster.org
> Sent: Friday, April 24, 2015 7:17:33 PM
> Subject: Re: [Gluster-devel] Core by test case : georep-tarssh-hybrid.t
>
> Hi,
> Here is a speculation :
>
> With the introduction of multi-threaded epoll we are processing multiple
> responses at the same time. The crash happened in _gf_free which
> originated from dht_getxattr_cbk (as seen in the backtrace). In current
> state we don't have a frame lock inside dht_getxattr_cbk. Hence, this path
> is prone to races.
>
> Here is a code-snippet from dht_getxattr_cbk.
> ===============================================
> this_call_cnt = dht_frame_return (frame);
Need to move the above line after "out" section. Othere wise will end up in dead lock.
> ..
> ..
> ..
> ..
>
>
> if (!local->xattr) {
> local->xattr = dict_copy_with_ref (xattr, NULL);
> } else {
> dht_aggregate_xattr (local->xattr, xattr);
> }
> out:
> if (is_last_call (this_call_cnt)) {
> DHT_STACK_UNWIND (getxattr, frame, local->op_ret, op_errno,
> local->xattr, NULL);
> }
> return 0;
>
> ===============================================
> Here I am depicting the responses from two cbks from a two subvol cluster.
>
> Thread:1 CBK1
> Thread:2
> CBK2
> ====================
> =====================
> time: 1. this_call_cnt = 1 (2-1)
>
> time:2
> this_call_cnt
> = 0 (1 - 1)
>
> time:3 enters this function dict_copy_with_ref
>
> time:4
> dht_aggregate_xattr
>
> time:5
> DHT_STACK_UNWIND
> [leading to dict_unref and destroy]
>
> time:6 Still busy with dict_with_ref
> and tries to unref dict leading to
> free which is already freed in
> other thread. Hence, a double free.
>
>
> Will compose a patch which encompass the critical section under frame->lock.
>
>
> Regards,
> Susant
>
> ----- Original Message -----
> > From: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> > To: "Venky Shankar" <vshankar at redhat.com>, "Pranith Kumar Karampuri"
> > <pkarampu at redhat.com>
> > Cc: gluster-devel at gluster.org
> > Sent: Friday, April 24, 2015 11:04:09 AM
> > Subject: Re: [Gluster-devel] Core by test case : georep-tarssh-hybrid.t
> >
> > I apologize, I thought it is the same issue that we assumed. I just
> > looked into the stack trace in and is a different issue. This crash
> > has happened when stime getxattr.
> >
> > Pranith,
> > You were working on min stime for ec, do you know abt this?
> >
> > The trace looks like this.
> >
> > 1.el6.x86_64 libgcc-4.4.7-11.el6.x86_64 libselinux-2.0.94-5.8.el6.x86_64
> > openssl-1.0.1e-30.el6.8.x86_64 zlib-1.2.3-29.el6.x86_64
> > (gdb) bt
> > #0 0x00007f4d89c41380 in pthread_spin_lock () from /lib64/libpthread.so.0
> > #1 0x00007f4d8a714438 in __gf_free (free_ptr=0x7f4d70023550) at
> > /home/jenkins/root/workspace/smoke/libglusterfs/src/mem-pool.c:303
> > #2 0x00007f4d8a6ca1fb in data_destroy (data=0x7f4d87f27488) at
> > /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:148
> > #3 0x00007f4d8a6caf46 in data_unref (this=0x7f4d87f27488) at
> > /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:549
> > #4 0x00007f4d8a6cde55 in dict_get_bin (this=0x7f4d88108be8,
> > key=0x7f4d78131230
> > "trusted.glusterfs.2e9a9aed-0389-4ead-ad39-8196f875cd56.6fe2b66c-0f08-40c2-8a5b-93ce6daf8d32.stime",
> > bin=0x7f4d7de276d8)
> > at /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:2231
> > #5 0x00007f4d7cfa0d19 in gf_get_min_stime (this=0x7f4d7800d690,
> > dst=0x7f4d88108be8,
> > key=0x7f4d78131230
> > "trusted.glusterfs.2e9a9aed-0389-4ead-ad39-8196f875cd56.6fe2b66c-0f08-40c2-8a5b-93ce6daf8d32.stime",
> > value=0x7f4d87f271b0)
> > at
> > /home/jenkins/root/workspace/smoke/xlators/cluster/afr/src/../../../../xlators/lib/src/libxlator.c:330
> > #6 0x00007f4d7cd16419 in dht_aggregate (this=0x7f4d88108d8c,
> > key=0x7f4d78131230
> > "trusted.glusterfs.2e9a9aed-0389-4ead-ad39-8196f875cd56.6fe2b66c-0f08-40c2-8a5b-93ce6daf8d32.stime",
> > value=0x7f4d87f271b0, data=0x7f4d88108be8)
> > at
> > /home/jenkins/root/workspace/smoke/xlators/cluster/dht/src/dht-common.c:116
> > #7 0x00007f4d8a6cc3b1 in dict_foreach_match (dict=0x7f4d88108d8c,
> > match=0x7f4d8a6cc244 <dict_match_everything>, match_data=0x0,
> > action=0x7f4d7cd16330 <dht_aggregate>, action_data=0x7f4d88108be8) at
> > /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:1182
> > #8 0x00007f4d8a6cc2a4 in dict_foreach (dict=0x7f4d88108d8c,
> > fn=0x7f4d7cd16330 <dht_aggregate>, data=0x7f4d88108be8)
> > at /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:1141
> > #9 0x00007f4d7cd165ae in dht_aggregate_xattr (dst=0x7f4d88108be8,
> > src=0x7f4d88108d8c) at
> > /home/jenkins/root/workspace/smoke/xlators/cluster/dht/src/dht-common.c:153
> > #10 0x00007f4d7cd2415e in dht_getxattr_cbk (frame=0x7f4d8870d118,
> > cookie=0x7f4d8870d1c4, this=0x7f4d7800d690, op_ret=0, op_errno=0,
> > xattr=0x7f4d88108d8c, xdata=0x0)
> > at
> > /home/jenkins/root/workspace/smoke/xlators/cluster/dht/src/dht-common.c:2710
> > #11 0x00007f4d7cf81293 in afr_getxattr_cbk (frame=0x7f4d8870d1c4,
> > cookie=0x0,
> > this=0x7f4d7800b560, op_ret=0, op_errno=0, dict=0x7f4d88108d8c, xdata=0x0)
> > at
> > /home/jenkins/root/workspace/smoke/xlators/cluster/afr/src/afr-inode-read.c:500
> > #12 0x00007f4d7d1fd829 in client3_3_getxattr_cbk (req=0x7f4d75e59504,
> > iov=0x7f4d75e59544, count=1, myframe=0x7f4d8870d270)
> > at
> > /home/jenkins/root/workspace/smoke/xlators/protocol/client/src/client-rpc-fops.c:1093
> > #13 0x00007f4d8a4a0d1c in rpc_clnt_handle_reply (clnt=0x7f4d7811a100,
> > pollin=0x7f4d7812c660) at
> > /home/jenkins/root/workspace/smoke/rpc/rpc-lib/src/rpc-clnt.c:766
> > #14 0x00007f4d8a4a113c in rpc_clnt_notify (trans=0x7f4d78129d70,
> > mydata=0x7f4d7811a130, event=RPC_TRANSPORT_MSG_RECEIVED,
> > data=0x7f4d7812c660)
> > at /home/jenkins/root/workspace/smoke/rpc/rpc-lib/src/rpc-clnt.c:894
> > #15 0x00007f4d8a49d66c in rpc_transport_notify (this=0x7f4d78129d70,
> > event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f4d7812c660)
> > at
> > /home/jenkins/root/workspace/smoke/rpc/rpc-lib/src/rpc-transport.c:543
> > #16 0x00007f4d7f44e311 in socket_event_poll_in (this=0x7f4d78129d70) at
> > /home/jenkins/root/workspace/smoke/rpc/rpc-transport/socket/src/socket.c:2290
> > #17 0x00007f4d7f44e7cc in socket_event_handler (fd=15, idx=4,
> > data=0x7f4d78129d70, poll_in=1, poll_out=0, poll_err=0)
> > at
> > /home/jenkins/root/workspace/smoke/rpc/rpc-transport/socket/src/socket.c:2403
> > #18 0x00007f4d8a747e2d in event_dispatch_epoll_handler
> > (event_pool=0x1cc9ba0,
> > event=0x7f4d7de27e70)
> > at
> > /home/jenkins/root/workspace/smoke/libglusterfs/src/event-epoll.c:572
> > #19 0x00007f4d8a748186 in event_dispatch_epoll_worker (data=0x1d11cc0) at
> > /home/jenkins/root/workspace/smoke/libglusterfs/src/event-epoll.c:674
> > #20 0x00007f4d89c3c9d1 in start_thread () from /lib64/libpthread.so.0
> > #21 0x00007f4d895a68fd in clone () from /lib64/libc.so.6
> >
> >
> > Thanks and Regards,
> > Kotresh H R
> >
> > ----- Original Message -----
> > > From: "Venky Shankar" <vshankar at redhat.com>
> > > To: gluster-devel at gluster.org
> > > Sent: Friday, April 24, 2015 10:53:45 AM
> > > Subject: Re: [Gluster-devel] Core by test case : georep-tarssh-hybrid.t
> > >
> > >
> > > On 04/24/2015 10:22 AM, Kotresh Hiremath Ravishankar wrote:
> > > > Hi Atin,
> > > >
> > > > It is not spurious, there is an issue with this pointer I think. All
> > > > changelog consumers such as bitrot, geo-rep would see this. Since it's
> > > > a race, it occurred with gsyncd.
> > >
> > > Correct. Jeff has mentioned this a while ago. I'll help out Kotresh in
> > > fixing this issue. In the meantime is it possible to disable
> > > geo-replication regression test cases until this gets fixed?
> > >
> > > >
> > > > No, the patch http://review.gluster.org/#/c/10340/ will not
> > > > take care of it. It just improves the time taken for geo-rep
> > > > regression.
> > > >
> > > > I am looking into it.
> > > >
> > > > Thanks and Regards,
> > > > Kotresh H R
> > > >
> > > > ----- Original Message -----
> > > >> From: "Atin Mukherjee" <amukherj at redhat.com>
> > > >> To: "kotresh Hiremath Ravishankar" <khiremat at redhat.com>, "Aravinda
> > > >> Vishwanathapura Krishna Murthy"
> > > >> <avishwan at redhat.com>
> > > >> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> > > >> Sent: Friday, April 24, 2015 9:35:00 AM
> > > >> Subject: Core by test case : georep-tarssh-hybrid.t
> > > >>
> > > >> [1] has core file generated by tests/geo-rep/georep-tarssh-hybrid.t.
> > > >> Is
> > > >> it something alarming or http://review.gluster.org/#/c/10340/ would
> > > >> take
> > > >> care of it?
> > > >>
> > > >> [1]
> > > >> http://build.gluster.org/job/rackspace-regression-2GB-triggered/7345/consoleFull
> > > >> --
> > > >> ~Atin
> > > >>
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel at gluster.org
> > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > >
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> >
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
More information about the Gluster-devel
mailing list