[Gluster-devel] 1.3.12 segfault
Matt McCowan
gluster at rpsmetocean.com
Thu Jan 15 05:02:45 UTC 2009
Greetings. It's my first post to this list so please bear with me while I try and flesh out the segfault I saw yesterday ...
Call me brave, call me stupid - without enough equipment on which to test things I have plunged glusterfs 1.3.12 straight into production on a small Opteron based cluster. The 14 clients are either 2 or 4 way Opteron driven (44 core all up) running on amd64 Gentoo with a 2.6.20 kernel and the Gluster 2.7.3 fuse module.
Running the same Gentoo as the clients the two servers are 4 way Opteron, dual homed (GigE) with a glusterfsd per network connection, sharing out 250G per daemon.
Yesterday the glusterfs process on one of the 2 way clients went to 100%. Attaching an strace to it showed it repeatedly calling nanosleep. Since the machine needed to be back online quickly (oh for the budget of LANL!) I tried to ctrl-c the strace, then sigterm, then had to sigkill it.
The sigterm must have got through to the glusterfs process because the log on the client contains:
"2009-01-14 14:01:53 W [glusterfs.c:416:glusterfs_cleanup_and_exit] glusterfs: shutting down server"
There were no log entries made when it was running at 100%.
The problem on the client was first noticed when a user tried to tab-complete a directory listing of the gluster mounted file system.
The gluster client was restarted. It was only a couple of hours later when some of the users reported issues that I noticed one of the glusterfsd's had died on a server. The glusterfsd segfault on the server coincides with killing the glusterfs on the client.
I haven't compiled gluster with debug, so following are entries from the server logs, client config, and a backtrace of the core dump (which unfortunately mirrors what's in the logs).
Side note: in an earlier 1.3.12 config we were running stripe across two glusterfsd backends. It proved to be quite unstable (specifically with directories sometimes not sync'ing on the backends) compared to the unify+namespace config. Otherwise glusterfs seems to be all round easier to install and use compared to my first cluster filesystem attempt with PVFS.
Contents of /var/log/glusterfsd.log:
====================================
2009-01-14 14:01:53 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (172.17.231.162:1016)
2009-01-14 14:01:53 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (172.17.231.162:1017)
TLA Repo Revision: glusterfs--mainline--2.5--patch-797
Time : 2009-01-14 14:01:53
Signal Number : 11
glusterfsd -f /etc/glusterfs/glusterfs-server-shareda.vol -l /var/log/glusterfs/glusterfsd.log -L WARNING
volume server
type protocol/server
option auth.ip.nsbricka.allow *
option auth.ip.hans.allow *
option auth.ip.data.allow *
option bind-address 172.17.231.170
option transport-type tcp/server
subvolumes data hans nsbricka
end-volume
volume data
type performance/io-threads
option cache-size 128M
option thread-count 4
subvolumes databrick
end-volume
volume databrick
type storage/posix
option directory /var/local/shareda
end-volume
volume hans
type cluster/afr
subvolumes nsbricka nsbrickb
end-volume
volume nsbrickb
type protocol/client
option remote-subvolume nsbricka
option remote-host maelstroma9
option transport-type tcp/client
end-volume
volume nsbricka
type storage/posix
option directory /var/local/namespace
end-volume
frame : type(0) op(0)
frame : type(0) op(0)
2009-01-14 14:01:53 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (172.17.231.162:1015)
/lib/libc.so.6[0x2af3d0e0f940]
/usr/lib64/glusterfs/1.3.12/xlator/cluster/afr.so(afr_close+0x140)[0x2aaaaacd37d0]
/usr/lib64/glusterfs/1.3.12/xlator/protocol/server.so(server_protocol_cleanup+0x1af)[0x2aaaaaef80cf]
/usr/lib64/glusterfs/1.3.12/xlator/protocol/server.so(notify+0x6e)[0x2aaaaaef853e]
/usr/lib/libglusterfs.so.0(transport_unref+0x64)[0x2af3d0ab32b4]
/usr/lib64/glusterfs/1.3.12/transport/tcp/client.so(tcp_disconnect+0x7d)[0x2aaaaaffdcfd]
/usr/lib64/glusterfs/1.3.12/xlator/protocol/server.so(notify+0x61)[0x2aaaaaef8531]
/usr/lib/libglusterfs.so.0(sys_epoll_iteration+0xbb)[0x2af3d0ab3c4b]
/usr/lib/libglusterfs.so.0(poll_iteration+0x78)[0x2af3d0ab3008]
[glusterfs](main+0x67c)[0x40288c]
/lib/libc.so.6(__libc_start_main+0xf4)[0x2af3d0dfd374]
[glusterfs][0x401d59]
---------
====================================
end of glusterfsd.log
/etc/glusterfs/glusterfs-client.vol:
====================================
volume brick1
type protocol/client
option transport-type tcp/client # for TCP/IP transport
option remote-host maelstroma0
option transport-timeout 120
option remote-subvolume data # name of the remote volume
end-volume
volume brick2
type protocol/client
option transport-type tcp/client # for TCP/IP transport
option remote-host maelstroma0a
option transport-timeout 120
option remote-subvolume data # name of the remote volume
end-volume
volume brick3
type protocol/client
option transport-type tcp/client # for TCP/IP transport
option remote-host maelstroma9
option transport-timeout 120
option remote-subvolume data # name of the remote volume
end-volume
volume brick4
type protocol/client
option transport-type tcp/client # for TCP/IP transport
option remote-host maelstroma9a
option transport-timeout 120
option remote-subvolume data # name of the remote volume
end-volume
volume ns
type protocol/client
option transport-type tcp/client
option remote-host gluster
option transport-timeout 120
option remote-subvolume hans
end-volume
volume unify
type cluster/unify
option scheduler rr
option rr.limits.min-free-disk 5
option namespace ns
subvolumes brick1 brick2 brick3 brick4
end-volume
volume iothreads
type performance/io-threads
#option thread-count 8
option thread-count 4
option cache-size 64M
subvolumes unify
end-volume
volume readahead
type performance/read-ahead
option page-size 1024kb
option page-count 10
subvolumes iothreads
end-volume
volume iocache
type performance/io-cache
option cache-size 64MB #default 32M
option page-size 1MB #default 128kb
subvolumes readahead
end-volume
volume writebehind
type performance/write-behind
option aggregate-size 1MB
option flush-behind off
subvolumes iocache
end-volume
====================================
end of glusterfs-client.vol
gdb backtrace:
====================================
gdb /usr/sbin/glusterfsd /core.28935
GNU gdb 6.7.1
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu"...
(no debugging symbols found)
Using host libthread_db library "/lib/libthread_db.so.1".
Reading symbols from /usr/lib64/libglusterfs.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libglusterfs.so.0
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib64/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/lib64/glusterfs/1.3.12/xlator/storage/posix.so...done.
Loaded symbols for /usr/lib64/glusterfs/1.3.12/xlator/storage/posix.so
Reading symbols from /usr/lib64/glusterfs/1.3.12/xlator/protocol/client.so...done.
Loaded symbols for /usr/lib64/glusterfs/1.3.12/xlator/protocol/client.so
Reading symbols from /usr/lib64/glusterfs/1.3.12/xlator/cluster/afr.so...done.
Loaded symbols for /usr/lib64/glusterfs/1.3.12/xlator/cluster/afr.so
Reading symbols from /usr/lib64/glusterfs/1.3.12/xlator/performance/io-threads.so...done.
Loaded symbols for /usr/lib64/glusterfs/1.3.12/xlator/performance/io-threads.so
Reading symbols from /usr/lib64/glusterfs/1.3.12/xlator/protocol/server.so...done.
Loaded symbols for /usr/lib64/glusterfs/1.3.12/xlator/protocol/server.so
Reading symbols from /usr/lib64/glusterfs/1.3.12/transport/tcp/client.so...done.
Loaded symbols for /usr/lib64/glusterfs/1.3.12/transport/tcp/client.so
Reading symbols from /usr/lib64/glusterfs/1.3.12/transport/tcp/server.so...done.
Loaded symbols for /usr/lib64/glusterfs/1.3.12/transport/tcp/server.so
Reading symbols from /usr/lib64/glusterfs/1.3.12/auth/ip.so...done.
Loaded symbols for /usr/lib64/glusterfs/1.3.12/auth/ip.so
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib64/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib64/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib64/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Core was generated by `[glusterfs] '.
Program terminated with signal 11, Segmentation fault.
#0 0x00002aaaaacd37d0 in afr_close ()
from /usr/lib64/glusterfs/1.3.12/xlator/cluster/afr.so
(gdb) q
====================================
end of bactrace
Thanks for glusterfs
Regards
Matt McCowan
sysadmin
RPS MetOcean
Perth, Western Australia
More information about the Gluster-devel
mailing list