[Gluster-users] [Gluster-devel] GlusterFS 3.3.1 client crash (signal received: 6)

Wed Oct 23 09:23:03 UTC 2013

Pranith,

Thanks for your detail answer. 

Our workload includes CREATE/WRITE/READ/STAT/ACCESS, as well as chmod(filepath, 0). While I don't know which kind of workload lead to the crash.
We have analyzed the related code such as dict, lookup of cluster/afr, lookup of protocol/client and have nothing useful information to help locate the issues.

Song.

-----Original Message-----
From: Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] 
Sent: Tuesday, October 22, 2013 5:25 PM
To: Song
Cc: John Mark Walker; gluster-users at gluster.org
Subject: Re: [Gluster-users] [Gluster-devel] GlusterFS 3.3.1 client crash (signal received: 6)

Song,
     The information printed in that function gf_print_trace has been useful in the sense that we know it happens when there is a double 'memput' of one of the data structures as part of 'lookup'. The problem is this issue seems to be happening only in some peculiar case, which unfortunately you are hitting every day on 1-2 clients. That is why I was trying to figure out what the workload is.

Let me tell you what I mean by 'workload' is.
For example:
For websites which do some kind of image manipulation. They generally CREATE temporary information and do some transformations i.e. READS/WRITES and then RENAME them to the actual files.
So here the work load is CREATE/READ/WRITE/RENAME intensive.

To give you one more example:
VM image hosting(At least with the KVM images that I test generally), On each VM image it pretty much does WRITES, READs, STATs so it is WRITEs/STATs/READs intensive.

I would really like to know what kind of workload happens on your setup to figure out what is that peculiar thing that may lead to this crash.

Pranith.

----- Original Message -----
> From: "Song" <gluster at 163.com>
> To: "Song" <gluster at 163.com>, "John Mark Walker" <johnmark at gluster.org>, "Pranith Kumar Karampuri"
> <pkarampu at redhat.com>
> Cc: gluster-users at gluster.org
> Sent: Tuesday, October 22, 2013 1:56:48 PM
> Subject: RE: [Gluster-users] [Gluster-devel] GlusterFS 3.3.1 client crash	(signal received: 6)
> 
> To locate this issue, is it possible to print more useful information 
> in backtrace?
> When client crashed, trace information was printed. Which is coded in 
> function of "gf_print_trace", in common-utils.c.
> I hope that some helpful debug information would be appended in this 
> function and when client crash next time, the data can help us to 
> analyze the problem.
> 
> Could you give me the suggestion what codes is useful?
> Thanks!
> 
> -----Original Message-----
> From: gluster-users-bounces at gluster.org 
> [mailto:gluster-users-bounces at gluster.org] On Behalf Of Song
> Sent: Friday, September 06, 2013 10:17 AM
> To: 'John Mark Walker'; 'Pranith Kumar Karampuri'
> Cc: gluster-users at gluster.org
> Subject: Re: [Gluster-users] [Gluster-devel] GlusterFS 3.3.1 client 
> crash (signal received: 6)
> 
> It's a pity I don't know how to re-create the issue. While there are 
> 1-2 crashed clients in total 120 clients every day.
> 
> Below is gdb result:
> 
> (gdb) where
> #0  0x0000003267432885 in raise () from /lib64/libc.so.6
> #1  0x0000003267434065 in abort () from /lib64/libc.so.6
> #2  0x000000326746f7a7 in __libc_message () from /lib64/libc.so.6
> #3  0x00000032674750c6 in malloc_printerr () from /lib64/libc.so.6
> #4  0x00007fc4f2847684 in mem_put (ptr=0x7fc4b0a4c03c) at 
> mem-pool.c:559
> #5  0x00007fc4f281cc9b in dict_destroy (this=0x7fc4f12cc5cc) at 
> dict.c:397
> #6  0x00007fc4ede24c30 in afr_local_cleanup (local=0x7fc4ce68ac20, 
> this=<value optimized out>) at afr-common.c:848
> #7  0x00007fc4ede2c0f1 in afr_lookup_done (frame=0x18d5ae4, 
> cookie=0x0, this=<value optimized out>, op_ret=<value optimized out>, 
> op_errno=<value optimized out>, inode=0x18d5b20,
>     buf=0x7fffcb83ec50, xattr=0x7fc4f12e1818, 
> postparent=0x7fffcb83ebe0) at
> afr-common.c:1881
> #8  afr_lookup_cbk (frame=0x18d5ae4, cookie=0x0, this=<value optimized 
> out>, op_ret=<value optimized out>, op_errno=<value optimized out>, 
> inode=0x18d5b20, buf=0x7fffcb83ec50,
>     xattr=0x7fc4f12e1818, postparent=0x7fffcb83ebe0) at 
> afr-common.c:2044
> #9  0x00007fc4ee066550 in client3_1_lookup_cbk (req=<value optimized 
> out>, iov=<value optimized out>, count=<value optimized out>,
> myframe=0x7fc4f16f390c) at client3_1-fops.c:2636
> #10 0x00007fc4f25ff4e5 in rpc_clnt_handle_reply (clnt=0x3b5c600,
> pollin=0x6ba00f0) at rpc-clnt.c:786
> #11 0x00007fc4f25ffce0 in rpc_clnt_notify (trans=<value optimized 
> out>, mydata=0x3b5c630, event=<value optimized out>, data=<value 
> optimized out>) at rpc-clnt.c:905
> #12 0x00007fc4f25faeb8 in rpc_transport_notify (this=<value optimized 
> out>, event=<value optimized out>, data=<value optimized out>) at
> rpc-transport.c:489
> #13 0x00007fc4eeeb0764 in socket_event_poll_in (this=0x3b6c060) at
> socket.c:1677
> #14 0x00007fc4eeeb0847 in socket_event_handler (fd=<value optimized 
> out>, idx=265, data=0x3b6c060, poll_in=1, poll_out=0, poll_err=<value 
> optimized
> out>) at socket.c:1792
> #15 0x00007fc4f2846464 in event_dispatch_epoll_handler
> (event_pool=0x177cdf0) at event.c:785
> #16 event_dispatch_epoll (event_pool=0x177cdf0) at event.c:847
> #17 0x000000000040736a in main (argc=<value optimized out>,
> argv=0x7fffcb83efc8) at glusterfsd.c:1689
> 
> 
> -----Original Message-----
> From: jowalker at redhat.com [mailto:jowalker at redhat.com] On Behalf Of 
> John Mark Walker
> Sent: Thursday, September 05, 2013 1:06 PM
> To: Pranith Kumar Karampuri
> Cc: Song; gluster-devel at nongnu.org
> Subject: Re: [Gluster-devel] GlusterFS 3.3.1 client crash (signal received:
> 6)
> 
> Posting to gluster-users.
> 
> 
> ----- Pranith Kumar Karampuri <pkarampu at redhat.com> wrote:
> > Song,
> > Seems like the issue is happening because of double 'memput', Could 
> > you
> let us know the steps to re-create the issue? Or the load that may 
> lead to this?
> > 
> > Pranith
> > 
> > ----- Original Message -----
> > > From: "Song" <gluster at 163.com>
> > > To: gluster-devel at nongnu.org
> > > Sent: Thursday, September 5, 2013 8:05:57 AM
> > > Subject: [Gluster-devel] GlusterFS 3.3.1 client crash (signal
> > > received: 6)
> > > 
> > > 
> > > 
> > > I installed GlusterFS 3.3.1 in my 24 servers, created a DHT+AFR 
> > > volume and mounted it with native client.
> > > 
> > > Recently, some glusterfs clients is crashed, the log is as below.
> > > 
> > > 
> > > 
> > > The OS is 64bit CentOS6.2, kernel version:
> > > 2.6.32-220.23.1.el6.x86_64 #1 SMP Fri Jun 28 00:56:49 CST 2013
> > > x86_64 x86_64 x86_64 GNU/Linux
> > > 
> > > 
> > > 
> > > 
> > > 
> > > pending frames:
> > > 
> > > frame : type(1) op(LOOKUP)
> > > 
> > > frame : type(1) op(LOOKUP)
> > > 
> > > frame : type(1) op(LOOKUP)
> > > 
> > > 
> > > 
> > > patchset: git://git.gluster.com/glusterfs.git
> > > 
> > > signal received: 6
> > > 
> > > time of crash: 2013-09-05 00:37:40
> > > 
> > > configuration details:
> > > 
> > > argp 1
> > > 
> > > backtrace 1
> > > 
> > > dlfcn 1
> > > 
> > > fdatasync 1
> > > 
> > > libpthread 1
> > > 
> > > llistxattr 1
> > > 
> > > setfsid 1
> > > 
> > > spinlock 1
> > > 
> > > epoll.h 1
> > > 
> > > xattr.h 1
> > > 
> > > st_atim.tv_nsec 1
> > > 
> > > package-string: glusterfs 3.3.1
> > > 
> > > /lib64/libc.so.6[0x3ac0232900]
> > > 
> > > /lib64/libc.so.6(gsignal+0x35)[0x3ac0232885]
> > > 
> > > /lib64/libc.so.6(abort+0x175)[0x3ac0234065]
> > > 
> > > /lib64/libc.so.6[0x3ac026f7a7]
> > > 
> > > /lib64/libc.so.6[0x3ac02750c6]
> > > 
> > > /usr/lib/libglusterfs.so.0(mem_put+0x64)[0x7f3f99c2c684]
> > > 
> > > /usr/lib/glusterfs/3.3.1/xlator/cluster/replicate.so(afr_local_cle
> > > an
> > > up+0x60)[0x7f3f95209c30]
> > > 
> > > /usr/lib/glusterfs/3.3.1/xlator/cluster/replicate.so(afr_lookup_cb
> > > k+
> > > 0x5a1)[0x7f3f952110f1]
> > > 
> > > /usr/lib/glusterfs/3.3.1/xlator/protocol/client.so(client3_1_looku
> > > p_
> > > cbk+0x6b0)[0x7f3f9544b550]
> > > 
> > > /usr/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7f3f999e44e5]
> > > 
> > > /usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f3f999e4ce0]
> > > 
> > > /usr/lib/libgfrpc.so.0(rpc_transport_notify+0x28)[0x7f3f999dfeb8]
> > > 
> > > /usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_event_poll
> > > _i
> > > n+0x34)[0x7f3f96295764]
> > > 
> > > /usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_event_hand
> > > le
> > > r+0xc7)[0x7f3f96295847]
> > > 
> > > /usr/lib/libglusterfs.so.0(+0x3e464)[0x7f3f99c2b464]
> > > 
> > > /usr/sbin/glusterfs(main+0x58a)[0x40736a]
> > > 
> > > /lib64/libc.so.6(__libc_start_main+0xfd)[0x3ac021ecdd]
> > > 
> > > /usr/sbin/glusterfs[0x4042d9]
> > > 
> > > ---------
> > > 
> > > 
> > > 
> > > Best regards.
> > > 
> > > Willard Song
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at nongnu.org
> > > https://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > 
> > 
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at nongnu.org
> > https://lists.nongnu.org/mailman/listinfo/gluster-devel
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
> 
>