[Gluster-users] [Gluster-devel] GlusterFS 3.3.1 client crash (signal received: 6)

Song gluster at 163.com
Mon Dec 2 09:19:26 UTC 2013


Pranith, 

Another kind of client crash happened, gdb information is as below for you reference:

Core was generated by `/usr/sbin/glusterfs --log-level=INFO --volfile-id=gfs6 --volfile-server=bj-nx-c'.
Program terminated with signal 11, Segmentation fault.
#0  afr_frame_return (frame=<value optimized out>) at afr-common.c:983
983	                call_count = --local->call_count;
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6.x86_64 libgcc-4.4.6-3.el6.x86_64 openssl-1.0.0-20.el6.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) where
#0  afr_frame_return (frame=<value optimized out>) at afr-common.c:983
#1  0x00007f8aa1c1ebbc in afr_sh_entry_impunge_parent_setattr_cbk (setattr_frame=0x7f8aa525b248, cookie=<value optimized out>, this=0x1a82e00, op_ret=<value optimized out>, 
    op_errno=<value optimized out>, preop=<value optimized out>, postop=0x0, xdata=0x0) at afr-self-heal-entry.c:970
#2  0x00007f8aa1e5fecb in client3_1_setattr (frame=0x7f8aa54ec634, this=<value optimized out>, data=<value optimized out>) at client3_1-fops.c:5801
#3  0x00007f8aa1e58b41 in client_setattr (frame=0x7f8aa54ec634, this=<value optimized out>, loc=<value optimized out>, stbuf=<value optimized out>, valid=<value optimized out>, 
    xdata=<value optimized out>) at client.c:1915
#4  0x00007f8aa1c1f080 in afr_sh_entry_impunge_setattr (impunge_frame=0x7f8aa5454e10, this=<value optimized out>) at afr-self-heal-entry.c:1017
#5  0x00007f8aa1c1f5c0 in afr_sh_entry_impunge_xattrop_cbk (impunge_frame=0x7f8aa5454e10, cookie=0x1, this=0x1a82e00, op_ret=<value optimized out>, op_errno=22, xattr=<value optimized out>, 
    xdata=0x0) at afr-self-heal-entry.c:1067
#6  0x00007f8aa1e6b34e in client3_1_xattrop_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7f8aa54ad5b8) at client3_1-fops.c:1715
#7  0x00000037eba0f4e5 in rpc_clnt_handle_reply (clnt=0x1eaccd0, pollin=0x2fba390) at rpc-clnt.c:786
#8  0x00000037eba0fce0 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x1eacd00, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:905
#9  0x00000037eba0aeb8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:489
#10 0x00007f8aa2cb5764 in socket_event_poll_in (this=0x1ebc730) at socket.c:1677
#11 0x00007f8aa2cb5847 in socket_event_handler (fd=<value optimized out>, idx=127, data=0x1ebc730, poll_in=1, poll_out=0, poll_err=<value optimized out>) at socket.c:1792
#12 0x00000037eb63e464 in event_dispatch_epoll_handler (event_pool=0x19eddf0) at event.c:785
#13 event_dispatch_epoll (event_pool=0x19eddf0) at event.c:847
#14 0x000000000040736a in main (argc=<value optimized out>, argv=0x7fff26cdcd78) at glusterfsd.c:1689

-----Original Message-----
From: Song [mailto:gluster at 163.com] 
Sent: Monday, October 28, 2013 11:25 AM
To: 'Pranith Kumar Karampuri'
Cc: 'John Mark Walker'; 'gluster-users at gluster.org'
Subject: RE: [Gluster-users] [Gluster-devel] GlusterFS 3.3.1 client crash (signal received: 6)

Pranith,

Another similar client crash happened. Following are the glusterfs log and gdb output for your reference.

pending frames:
frame : type(1) op(STATFS)
frame : type(1) op(STATFS)

patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 2013-10-28 00:41:53
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.3.1
/lib64/libc.so.6[0x3a0c432900]
/lib64/libc.so.6(gsignal+0x35)[0x3a0c432885]
/lib64/libc.so.6(abort+0x175)[0x3a0c434065]
/lib64/libc.so.6[0x3a0c46f7a7]
/lib64/libc.so.6[0x3a0c4750c6]
/usr/lib/libglusterfs.so.0(gf_timer_call_cancel+0xb0)[0x328b42a180]
/usr/lib/glusterfs/3.3.1/xlator/protocol/client.so(client_ping_cbk+0x6d)[0x7f3514afe54d]
/usr/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x328b80f4e5]
/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x328b80fce0]
/usr/lib/libgfrpc.so.0(rpc_transport_notify+0x28)[0x328b80aeb8]
/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_event_poll_in+0x34)[0x7f351593a764]
/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_event_handler+0xc7)[0x7f351593a847]
/usr/lib/libglusterfs.so.0[0x328b43e464]
/usr/sbin/glusterfs(main+0x58a)[0x40736a]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3a0c41ecdd]
/usr/sbin/glusterfs[0x4042d9]
---------


(gdb) where
#0  0x0000003a0c432885 in raise () from /lib64/libc.so.6
#1  0x0000003a0c434065 in abort () from /lib64/libc.so.6
#2  0x0000003a0c46f7a7 in __libc_message () from /lib64/libc.so.6
#3  0x0000003a0c4750c6 in malloc_printerr () from /lib64/libc.so.6
#4  0x000000328b42a180 in gf_timer_call_cancel (ctx=<value optimized out>, event=0x7f34f0001730) at timer.c:122
#5  0x00007f3514afe54d in client_ping_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7f3517f0751c) at client-handshake.c:285
#6  0x000000328b80f4e5 in rpc_clnt_handle_reply (clnt=0x1890aa0, pollin=0x1e7acb0) at rpc-clnt.c:786
#7  0x000000328b80fce0 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x1890ad0, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:905
#8  0x000000328b80aeb8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:489
#9  0x00007f351593a764 in socket_event_poll_in (this=0x18a0500) at socket.c:1677
#10 0x00007f351593a847 in socket_event_handler (fd=<value optimized out>, idx=41, data=0x18a0500, poll_in=1, poll_out=0, poll_err=<value optimized out>) at socket.c:1792
#11 0x000000328b43e464 in event_dispatch_epoll_handler (event_pool=0x930df0) at event.c:785
#12 event_dispatch_epoll (event_pool=0x930df0) at event.c:847
#13 0x000000000040736a in main (argc=<value optimized out>, argv=0x7fff829eac78) at glusterfsd.c:1689



-----Original Message-----
From: Pranith Kumar Karampuri [mailto:pkarampu at redhat.com]
Sent: Friday, October 25, 2013 2:00 PM
To: Song
Cc: John Mark Walker; gluster-users at gluster.org
Subject: Re: [Gluster-users] [Gluster-devel] GlusterFS 3.3.1 client crash (signal received: 6)

Thanks for this information. Let us see if we can re-create the issue in our environment. If that does not help, we shall do a detailed analysis of the code to figure this out.

Pranith
----- Original Message -----
> From: "Song" <gluster at 163.com>
> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> Cc: "John Mark Walker" <johnmark at gluster.org>, 
> gluster-users at gluster.org
> Sent: Wednesday, October 23, 2013 2:53:03 PM
> Subject: RE: [Gluster-users] [Gluster-devel] GlusterFS 3.3.1 client crash	(signal received: 6)
> 
> Pranith,
> 
> Thanks for your detail answer.
> 
> Our workload includes CREATE/WRITE/READ/STAT/ACCESS, as well as 
> chmod(filepath, 0). While I don't know which kind of workload lead to 
> the crash.
> We have analyzed the related code such as dict, lookup of cluster/afr, 
> lookup of protocol/client and have nothing useful information to help 
> locate the issues.
> 
> Song.
> 
> -----Original Message-----
> From: Pranith Kumar Karampuri [mailto:pkarampu at redhat.com]
> Sent: Tuesday, October 22, 2013 5:25 PM
> To: Song
> Cc: John Mark Walker; gluster-users at gluster.org
> Subject: Re: [Gluster-users] [Gluster-devel] GlusterFS 3.3.1 client 
> crash (signal received: 6)
> 
> Song,
>      The information printed in that function gf_print_trace has been useful
>      in the sense that we know it happens when there is a double 'memput' of
>      one of the data structures as part of 'lookup'. The problem is this
>      issue seems to be happening only in some peculiar case, which
>      unfortunately you are hitting every day on 1-2 clients. That is why I
>      was trying to figure out what the workload is.
> 
> Let me tell you what I mean by 'workload' is.
> For example:
> For websites which do some kind of image manipulation. They generally 
> CREATE temporary information and do some transformations i.e.
> READS/WRITES and then RENAME them to the actual files.
> So here the work load is CREATE/READ/WRITE/RENAME intensive.
> 
> To give you one more example:
> VM image hosting(At least with the KVM images that I test generally), 
> On each VM image it pretty much does WRITES, READs, STATs so it is 
> WRITEs/STATs/READs intensive.
> 
> I would really like to know what kind of workload happens on your 
> setup to figure out what is that peculiar thing that may lead to this crash.
> 
> Pranith.
> 
> ----- Original Message -----
> > From: "Song" <gluster at 163.com>
> > To: "Song" <gluster at 163.com>, "John Mark Walker" 
> > <johnmark at gluster.org>, "Pranith Kumar Karampuri"
> > <pkarampu at redhat.com>
> > Cc: gluster-users at gluster.org
> > Sent: Tuesday, October 22, 2013 1:56:48 PM
> > Subject: RE: [Gluster-users] [Gluster-devel] GlusterFS 3.3.1 client crash
> > 	(signal received: 6)
> > 
> > To locate this issue, is it possible to print more useful 
> > information in backtrace?
> > When client crashed, trace information was printed. Which is coded 
> > in function of "gf_print_trace", in common-utils.c.
> > I hope that some helpful debug information would be appended in this 
> > function and when client crash next time, the data can help us to 
> > analyze the problem.
> > 
> > Could you give me the suggestion what codes is useful?
> > Thanks!
> > 
> > -----Original Message-----
> > From: gluster-users-bounces at gluster.org 
> > [mailto:gluster-users-bounces at gluster.org] On Behalf Of Song
> > Sent: Friday, September 06, 2013 10:17 AM
> > To: 'John Mark Walker'; 'Pranith Kumar Karampuri'
> > Cc: gluster-users at gluster.org
> > Subject: Re: [Gluster-users] [Gluster-devel] GlusterFS 3.3.1 client 
> > crash (signal received: 6)
> > 
> > It's a pity I don't know how to re-create the issue. While there are
> > 1-2 crashed clients in total 120 clients every day.
> > 
> > Below is gdb result:
> > 
> > (gdb) where
> > #0  0x0000003267432885 in raise () from /lib64/libc.so.6
> > #1  0x0000003267434065 in abort () from /lib64/libc.so.6
> > #2  0x000000326746f7a7 in __libc_message () from /lib64/libc.so.6
> > #3  0x00000032674750c6 in malloc_printerr () from /lib64/libc.so.6
> > #4  0x00007fc4f2847684 in mem_put (ptr=0x7fc4b0a4c03c) at
> > mem-pool.c:559
> > #5  0x00007fc4f281cc9b in dict_destroy (this=0x7fc4f12cc5cc) at
> > dict.c:397
> > #6  0x00007fc4ede24c30 in afr_local_cleanup (local=0x7fc4ce68ac20, 
> > this=<value optimized out>) at afr-common.c:848
> > #7  0x00007fc4ede2c0f1 in afr_lookup_done (frame=0x18d5ae4, 
> > cookie=0x0, this=<value optimized out>, op_ret=<value optimized
> > out>, op_errno=<value optimized out>, inode=0x18d5b20,
> >     buf=0x7fffcb83ec50, xattr=0x7fc4f12e1818,
> > postparent=0x7fffcb83ebe0) at
> > afr-common.c:1881
> > #8  afr_lookup_cbk (frame=0x18d5ae4, cookie=0x0, this=<value 
> > optimized
> > out>, op_ret=<value optimized out>, op_errno=<value optimized out>,
> > inode=0x18d5b20, buf=0x7fffcb83ec50,
> >     xattr=0x7fc4f12e1818, postparent=0x7fffcb83ebe0) at
> > afr-common.c:2044
> > #9  0x00007fc4ee066550 in client3_1_lookup_cbk (req=<value optimized
> > out>, iov=<value optimized out>, count=<value optimized out>,
> > myframe=0x7fc4f16f390c) at client3_1-fops.c:2636
> > #10 0x00007fc4f25ff4e5 in rpc_clnt_handle_reply (clnt=0x3b5c600,
> > pollin=0x6ba00f0) at rpc-clnt.c:786
> > #11 0x00007fc4f25ffce0 in rpc_clnt_notify (trans=<value optimized
> > out>, mydata=0x3b5c630, event=<value optimized out>, data=<value
> > optimized out>) at rpc-clnt.c:905
> > #12 0x00007fc4f25faeb8 in rpc_transport_notify (this=<value 
> > optimized
> > out>, event=<value optimized out>, data=<value optimized out>) at
> > rpc-transport.c:489
> > #13 0x00007fc4eeeb0764 in socket_event_poll_in (this=0x3b6c060) at
> > socket.c:1677
> > #14 0x00007fc4eeeb0847 in socket_event_handler (fd=<value optimized
> > out>, idx=265, data=0x3b6c060, poll_in=1, poll_out=0, 
> > out>poll_err=<value
> > optimized
> > out>) at socket.c:1792
> > #15 0x00007fc4f2846464 in event_dispatch_epoll_handler
> > (event_pool=0x177cdf0) at event.c:785
> > #16 event_dispatch_epoll (event_pool=0x177cdf0) at event.c:847
> > #17 0x000000000040736a in main (argc=<value optimized out>,
> > argv=0x7fffcb83efc8) at glusterfsd.c:1689
> > 
> > 
> > -----Original Message-----
> > From: jowalker at redhat.com [mailto:jowalker at redhat.com] On Behalf Of 
> > John Mark Walker
> > Sent: Thursday, September 05, 2013 1:06 PM
> > To: Pranith Kumar Karampuri
> > Cc: Song; gluster-devel at nongnu.org
> > Subject: Re: [Gluster-devel] GlusterFS 3.3.1 client crash (signal received:
> > 6)
> > 
> > Posting to gluster-users.
> > 
> > 
> > ----- Pranith Kumar Karampuri <pkarampu at redhat.com> wrote:
> > > Song,
> > > Seems like the issue is happening because of double 'memput', 
> > > Could you
> > let us know the steps to re-create the issue? Or the load that may 
> > lead to this?
> > > 
> > > Pranith
> > > 
> > > ----- Original Message -----
> > > > From: "Song" <gluster at 163.com>
> > > > To: gluster-devel at nongnu.org
> > > > Sent: Thursday, September 5, 2013 8:05:57 AM
> > > > Subject: [Gluster-devel] GlusterFS 3.3.1 client crash (signal
> > > > received: 6)
> > > > 
> > > > 
> > > > 
> > > > I installed GlusterFS 3.3.1 in my 24 servers, created a DHT+AFR 
> > > > volume and mounted it with native client.
> > > > 
> > > > Recently, some glusterfs clients is crashed, the log is as below.
> > > > 
> > > > 
> > > > 
> > > > The OS is 64bit CentOS6.2, kernel version:
> > > > 2.6.32-220.23.1.el6.x86_64 #1 SMP Fri Jun 28 00:56:49 CST 2013
> > > > x86_64 x86_64 x86_64 GNU/Linux
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > pending frames:
> > > > 
> > > > frame : type(1) op(LOOKUP)
> > > > 
> > > > frame : type(1) op(LOOKUP)
> > > > 
> > > > frame : type(1) op(LOOKUP)
> > > > 
> > > > 
> > > > 
> > > > patchset: git://git.gluster.com/glusterfs.git
> > > > 
> > > > signal received: 6
> > > > 
> > > > time of crash: 2013-09-05 00:37:40
> > > > 
> > > > configuration details:
> > > > 
> > > > argp 1
> > > > 
> > > > backtrace 1
> > > > 
> > > > dlfcn 1
> > > > 
> > > > fdatasync 1
> > > > 
> > > > libpthread 1
> > > > 
> > > > llistxattr 1
> > > > 
> > > > setfsid 1
> > > > 
> > > > spinlock 1
> > > > 
> > > > epoll.h 1
> > > > 
> > > > xattr.h 1
> > > > 
> > > > st_atim.tv_nsec 1
> > > > 
> > > > package-string: glusterfs 3.3.1
> > > > 
> > > > /lib64/libc.so.6[0x3ac0232900]
> > > > 
> > > > /lib64/libc.so.6(gsignal+0x35)[0x3ac0232885]
> > > > 
> > > > /lib64/libc.so.6(abort+0x175)[0x3ac0234065]
> > > > 
> > > > /lib64/libc.so.6[0x3ac026f7a7]
> > > > 
> > > > /lib64/libc.so.6[0x3ac02750c6]
> > > > 
> > > > /usr/lib/libglusterfs.so.0(mem_put+0x64)[0x7f3f99c2c684]
> > > > 
> > > > /usr/lib/glusterfs/3.3.1/xlator/cluster/replicate.so(afr_local_c
> > > > le
> > > > an
> > > > up+0x60)[0x7f3f95209c30]
> > > > 
> > > > /usr/lib/glusterfs/3.3.1/xlator/cluster/replicate.so(afr_lookup_
> > > > cb
> > > > k+
> > > > 0x5a1)[0x7f3f952110f1]
> > > > 
> > > > /usr/lib/glusterfs/3.3.1/xlator/protocol/client.so(client3_1_loo
> > > > ku
> > > > p_
> > > > cbk+0x6b0)[0x7f3f9544b550]
> > > > 
> > > > /usr/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7f3f999e44e
> > > > 5]
> > > > 
> > > > /usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f3f999e4ce0]
> > > > 
> > > > /usr/lib/libgfrpc.so.0(rpc_transport_notify+0x28)[0x7f3f999dfeb8
> > > > ]
> > > > 
> > > > /usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_event_po
> > > > ll
> > > > _i
> > > > n+0x34)[0x7f3f96295764]
> > > > 
> > > > /usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_event_ha
> > > > nd
> > > > le
> > > > r+0xc7)[0x7f3f96295847]
> > > > 
> > > > /usr/lib/libglusterfs.so.0(+0x3e464)[0x7f3f99c2b464]
> > > > 
> > > > /usr/sbin/glusterfs(main+0x58a)[0x40736a]
> > > > 
> > > > /lib64/libc.so.6(__libc_start_main+0xfd)[0x3ac021ecdd]
> > > > 
> > > > /usr/sbin/glusterfs[0x4042d9]
> > > > 
> > > > ---------
> > > > 
> > > > 
> > > > 
> > > > Best regards.
> > > > 
> > > > Willard Song
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel at nongnu.org
> > > > https://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > > 
> > > 
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at nongnu.org
> > > https://lists.nongnu.org/mailman/listinfo/gluster-devel
> > 
> > 
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > 
> > 
> > 
> 
> 
> 





More information about the Gluster-users mailing list