[Gluster-devel] Сrash - 2.0.git-2009.06.16

Thu Jun 25 17:28:06 UTC 2009

NovA wrote:
> Hi everybody!
> 
> 

Thanks. I'd also need your server and client volfiles and logs.

What application were you using when this crash took place?

What version of GlusterFS is this? Is it a recent git checkout?

-Shehjar

> 
> Recently I've migrated our small 24-node HPC-cluster from glusterFS
> 1.3.8 unify to 2.0 distribute. It seems that performance really
> increased a lot. Thanks for your work!
> 
> I use the following translators. On servers:
> posix->locks->iothreads->protocol/server; on clients:
> protocol/client->distribute->iothreads->write-behind. IO-threads
> translator uses 4 threads, NO autoscaling.
> 
> 
> 
> Unfortunately, after upgrade I've got new issues. First, I've noticed
> a very high memory usage. Now GlusterFS on the head node eats 737Mb of
> RES memory and doesnt return it back. The memory usage have been
> increased in the migration process by the command "cd
> ${namespace_export} && find . | (cd ${distribute_mount} && xargs -d
> '\n' stat -c '%n')". Note that provided migrate-unify-to-distribute.sh
> script (with "execute_on" function) doesn't work...
> 
> 
> 
> Second problem is more important. A client on one of the nodes has
> crashed today with the following backtrace:
> 
> ------
> 
> Core was generated by `glusterfs -f /etc/glusterfs/client.vol -l
> /var/log/glusterfs/client.log /home'.
> 
> Program terminated with signal 11, Segmentation fault.
> 
> #0  0x00002b8039bec860 in ?? () from /lib64/libc.so.6
> 
> (gdb) bt
> 
> #0  0x00002b8039bec860 in ?? () from /lib64/libc.so.6
> 
> #1  0x00002b8039bedc0c in malloc () from /lib64/libc.so.6
> 
> #2  0x00002b8039548732 in fop_writev_stub (frame=<value optimized out>,
> 
>     fn=0x2b803ab6c160 <iot_writev_wrapper>, fd=0x2aaab001e8a0,
> vector=0x2aaab0071d50,
> 
>     count=<value optimized out>, off=105432, iobref=0x2aaab0082d60) at
> common-utils.h:166
> 
> #3  0x00002b803ab6ec00 in iot_writev (frame=0x4, this=0x6150c0,
> fd=0x2aaab0082711,
> 
>     vector=0x2aaab0083060, count=3, offset=105432, iobref=0x2aaab0082d60)
> 
>     at io-threads.c:1212
> 
> #4  0x00002b803ad7a3de in wb_sync (frame=0x2aaab0034c40, file=0x2aaaac007280,
> 
>     winds=0x7fff717a5450) at write-behind.c:445
> 
> #5  0x00002b803ad7a4ff in wb_do_ops (frame=0x2aaab0034c40, file=0x2aaaac007280,
> 
>     winds=0x7fff717a5450, unwinds=<value optimized out>,
> other_requests=0x7fff717a5430)
> 
>     at write-behind.c:1579
> 
> #6  0x00002b803ad7a617 in wb_process_queue (frame=0x2aaab0034c40,
> file=0x2aaaac007280,
> 
>     flush_all=0 '\0') at write-behind.c:1624
> 
> #7  0x00002b803ad7dd81 in wb_sync_cbk (frame=0x2aaab0034c40,
> 
>     cookie=<value optimized out>, this=<value optimized out>,
> op_ret=19, op_errno=0,
> 
>     stbuf=<value optimized out>) at write-behind.c:338
> 
> #8  0x00002b803ab6a1e0 in iot_writev_cbk (frame=0x2aaab00309d0,
> 
>     cookie=<value optimized out>, this=<value optimized out>,
> op_ret=19, op_errno=0,
> 
>     stbuf=0x7fff717a5590) at io-threads.c:1186
> 
> #9  0x00002b803a953aae in dht_writev_cbk (frame=0x63e3e0,
> cookie=<value optimized out>,
> 
>     this=<value optimized out>, op_ret=19, op_errno=0, stbuf=0x7fff717a5590)
> 
>     at dht-common.c:1797
> 
> #10 0x00002b803a7406e9 in client_write_cbk (frame=0x648a80, hdr=<value
> optimized out>,
> 
>     hdrlen=<value optimized out>, iobuf=<value optimized out>) at
> client-protocol.c:4363
> 
> #11 0x00002b803a72c83a in protocol_client_pollin (this=0x60ec70, trans=0x61a380)
> 
>     at client-protocol.c:6230
> 
> #12 0x00002b803a7370bc in notify (this=0x4, event=<value optimized
> out>, data=0x61a380)
> 
>     at client-protocol.c:6274
> 
> #13 0x00002b8039533183 in xlator_notify (xl=0x60ec70, event=2, data=0x61a380)
> 
>     at xlator.c:820
> 
> #14 0x00002aaaaaaaff0b in socket_event_handler (fd=<value optimized out>, idx=4,
> 
>     data=0x61a380, poll_in=1, poll_out=0, poll_err=0) at socket.c:813
> 
> #15 0x00002b803954b2aa in event_dispatch_epoll (event_pool=0x6094f0)
> at event.c:804
> 
> #16 0x0000000000403f34 in main (argc=6, argv=0x7fff717a64f8) at
> glusterfsd.c:1223
> 
> ----------
> 
> 
> 
> Later glusterFS crashed again with different backtrace:
> 
> ----------
> 
> Core was generated by `glusterfs -f /etc/glusterfs/client.vol -l
> /var/log/glusterfs/client.log /home'.
> 
> Program terminated with signal 6, Aborted.
> 
> #0  0x00002ae6dfcd4b45 in raise () from /lib64/libc.so.6
> 
> (gdb) bt
> 
> #0  0x00002ae6dfcd4b45 in raise () from /lib64/libc.so.6
> 
> #1  0x00002ae6dfcd60e0 in abort () from /lib64/libc.so.6
> 
> #2  0x00002ae6dfd0cfbb in ?? () from /lib64/libc.so.6
> 
> #3  0x00002ae6dfd1221d in ?? () from /lib64/libc.so.6
> 
> #4  0x00002ae6dfd13f76 in free () from /lib64/libc.so.6
> 
> #5  0x00002ae6df673efd in mem_put (pool=0x631a90, ptr=0x2aaaac0bc520)
> at mem-pool.c:191
> 
> #6  0x00002ae6e0c992ce in iot_dequeue_ordered (worker=0x631a20) at
> io-threads.c:2407
> 
> #7  0x00002ae6e0c99326 in iot_worker_ordered (arg=<value optimized out>)
> 
>     at io-threads.c:2421
> 
> #8  0x00002ae6dfa8e020 in start_thread () from /lib64/libpthread.so.0
> 
> #9  0x00002ae6dfd68f8d in clone () from /lib64/libc.so.6
> 
> #10 0x0000000000000000 in ?? ()
> 
> ----------
> 
> 
> 
> Hope this backtraces help to find an issue...
> 
> 
> 
> Best regards,
> 
>   Andrey
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>