[Gluster-devel] Сrash - 2.0.git-2009.06.16
Shehjar Tikoo
shehjart at gluster.com
Thu Jun 25 17:28:06 UTC 2009
NovA wrote:
> Hi everybody!
>
>
Thanks. I'd also need your server and client volfiles and logs.
What application were you using when this crash took place?
What version of GlusterFS is this? Is it a recent git checkout?
-Shehjar
>
> Recently I've migrated our small 24-node HPC-cluster from glusterFS
> 1.3.8 unify to 2.0 distribute. It seems that performance really
> increased a lot. Thanks for your work!
>
> I use the following translators. On servers:
> posix->locks->iothreads->protocol/server; on clients:
> protocol/client->distribute->iothreads->write-behind. IO-threads
> translator uses 4 threads, NO autoscaling.
>
>
>
> Unfortunately, after upgrade I've got new issues. First, I've noticed
> a very high memory usage. Now GlusterFS on the head node eats 737Mb of
> RES memory and doesnt return it back. The memory usage have been
> increased in the migration process by the command "cd
> ${namespace_export} && find . | (cd ${distribute_mount} && xargs -d
> '\n' stat -c '%n')". Note that provided migrate-unify-to-distribute.sh
> script (with "execute_on" function) doesn't work...
>
>
>
> Second problem is more important. A client on one of the nodes has
> crashed today with the following backtrace:
>
> ------
>
> Core was generated by `glusterfs -f /etc/glusterfs/client.vol -l
> /var/log/glusterfs/client.log /home'.
>
> Program terminated with signal 11, Segmentation fault.
>
> #0 0x00002b8039bec860 in ?? () from /lib64/libc.so.6
>
> (gdb) bt
>
> #0 0x00002b8039bec860 in ?? () from /lib64/libc.so.6
>
> #1 0x00002b8039bedc0c in malloc () from /lib64/libc.so.6
>
> #2 0x00002b8039548732 in fop_writev_stub (frame=<value optimized out>,
>
> fn=0x2b803ab6c160 <iot_writev_wrapper>, fd=0x2aaab001e8a0,
> vector=0x2aaab0071d50,
>
> count=<value optimized out>, off=105432, iobref=0x2aaab0082d60) at
> common-utils.h:166
>
> #3 0x00002b803ab6ec00 in iot_writev (frame=0x4, this=0x6150c0,
> fd=0x2aaab0082711,
>
> vector=0x2aaab0083060, count=3, offset=105432, iobref=0x2aaab0082d60)
>
> at io-threads.c:1212
>
> #4 0x00002b803ad7a3de in wb_sync (frame=0x2aaab0034c40, file=0x2aaaac007280,
>
> winds=0x7fff717a5450) at write-behind.c:445
>
> #5 0x00002b803ad7a4ff in wb_do_ops (frame=0x2aaab0034c40, file=0x2aaaac007280,
>
> winds=0x7fff717a5450, unwinds=<value optimized out>,
> other_requests=0x7fff717a5430)
>
> at write-behind.c:1579
>
> #6 0x00002b803ad7a617 in wb_process_queue (frame=0x2aaab0034c40,
> file=0x2aaaac007280,
>
> flush_all=0 '\0') at write-behind.c:1624
>
> #7 0x00002b803ad7dd81 in wb_sync_cbk (frame=0x2aaab0034c40,
>
> cookie=<value optimized out>, this=<value optimized out>,
> op_ret=19, op_errno=0,
>
> stbuf=<value optimized out>) at write-behind.c:338
>
> #8 0x00002b803ab6a1e0 in iot_writev_cbk (frame=0x2aaab00309d0,
>
> cookie=<value optimized out>, this=<value optimized out>,
> op_ret=19, op_errno=0,
>
> stbuf=0x7fff717a5590) at io-threads.c:1186
>
> #9 0x00002b803a953aae in dht_writev_cbk (frame=0x63e3e0,
> cookie=<value optimized out>,
>
> this=<value optimized out>, op_ret=19, op_errno=0, stbuf=0x7fff717a5590)
>
> at dht-common.c:1797
>
> #10 0x00002b803a7406e9 in client_write_cbk (frame=0x648a80, hdr=<value
> optimized out>,
>
> hdrlen=<value optimized out>, iobuf=<value optimized out>) at
> client-protocol.c:4363
>
> #11 0x00002b803a72c83a in protocol_client_pollin (this=0x60ec70, trans=0x61a380)
>
> at client-protocol.c:6230
>
> #12 0x00002b803a7370bc in notify (this=0x4, event=<value optimized
> out>, data=0x61a380)
>
> at client-protocol.c:6274
>
> #13 0x00002b8039533183 in xlator_notify (xl=0x60ec70, event=2, data=0x61a380)
>
> at xlator.c:820
>
> #14 0x00002aaaaaaaff0b in socket_event_handler (fd=<value optimized out>, idx=4,
>
> data=0x61a380, poll_in=1, poll_out=0, poll_err=0) at socket.c:813
>
> #15 0x00002b803954b2aa in event_dispatch_epoll (event_pool=0x6094f0)
> at event.c:804
>
> #16 0x0000000000403f34 in main (argc=6, argv=0x7fff717a64f8) at
> glusterfsd.c:1223
>
> ----------
>
>
>
> Later glusterFS crashed again with different backtrace:
>
> ----------
>
> Core was generated by `glusterfs -f /etc/glusterfs/client.vol -l
> /var/log/glusterfs/client.log /home'.
>
> Program terminated with signal 6, Aborted.
>
> #0 0x00002ae6dfcd4b45 in raise () from /lib64/libc.so.6
>
> (gdb) bt
>
> #0 0x00002ae6dfcd4b45 in raise () from /lib64/libc.so.6
>
> #1 0x00002ae6dfcd60e0 in abort () from /lib64/libc.so.6
>
> #2 0x00002ae6dfd0cfbb in ?? () from /lib64/libc.so.6
>
> #3 0x00002ae6dfd1221d in ?? () from /lib64/libc.so.6
>
> #4 0x00002ae6dfd13f76 in free () from /lib64/libc.so.6
>
> #5 0x00002ae6df673efd in mem_put (pool=0x631a90, ptr=0x2aaaac0bc520)
> at mem-pool.c:191
>
> #6 0x00002ae6e0c992ce in iot_dequeue_ordered (worker=0x631a20) at
> io-threads.c:2407
>
> #7 0x00002ae6e0c99326 in iot_worker_ordered (arg=<value optimized out>)
>
> at io-threads.c:2421
>
> #8 0x00002ae6dfa8e020 in start_thread () from /lib64/libpthread.so.0
>
> #9 0x00002ae6dfd68f8d in clone () from /lib64/libc.so.6
>
> #10 0x0000000000000000 in ?? ()
>
> ----------
>
>
>
> Hope this backtraces help to find an issue...
>
>
>
> Best regards,
>
> Andrey
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
More information about the Gluster-devel
mailing list