[Gluster-devel] brick crash/hang with io-threads in 2.5 patch 240

Fri Jun 29 14:53:54 UTC 2007

read tests passed but backup crashed brick and client

Here is backtrace from brick that crashed:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1269179504 (LWP 30452)]
inode_forget (inode=0x8064038, nlookup=0) at list.h:92
92              prev->next = next;
(gdb) bt
#0  inode_forget (inode=0x8064038, nlookup=0) at list.h:92
#1  0xb75c0d0a in posix_forget () from /usr/lib/glusterfs/1.3.0-pre5/xlator/storage/posix.so
#2  0xb75b5676 in iot_forget_wrapper () from /usr/lib/glusterfs/1.3.0-pre5/xlator/performance/io-threads.so
#3  0xb7f44f4a in call_resume_wind (stub=0x8064038) at call-stub.c:2027
#4  0xb7f44fd7 in call_resume (stub=0x810bfd8) at call-stub.c:2763
#5  0xb75b97a5 in iot_worker () from /usr/lib/glusterfs/1.3.0-pre5/xlator/performance/io-threads.so
#6  0xb7f153db in start_thread () from /lib/libpthread.so.0
#7  0xb7e9f26e in clone () from /lib/libc.so.6

Harris

----- Original Message -----
From: "Basavanagowda Kanur" <gowda at zresearch.com>
To: "Harris Landgarten" <harrisl at lhjonline.com>
Cc: "Anand Avati" <avati at zresearch.com>, "gluster-devel" <gluster-devel at nongnu.org>
Sent: Friday, June 29, 2007 9:36:17 AM (GMT-0500) America/New_York
Subject: Re: [Gluster-devel] brick crash/hang with io-threads in 2.5 patch 240

Harris, 
Please find the fix for the bug in patch-243. 

Thanks, 
gowda 

On 6/28/07 , Harris Landgarten < harrisl at lhjonline.com > wrote: 

Avati, 

I managed to get a bt from the server by attaching to the process with gdb 

0xb7f60f38 in dict_set (this=0x8056fc8, key=0xb75d8fa3 "key", value=0x8056c90) at dict.c:124 
124 for (pair = this->members[hashval]; pair != NULL; pair = pair->hash_next) { 
(gdb) bt 
#0 0xb7f60f38 in dict_set (this=0x8056fc8, key=0xb75d8fa3 "key", value=0x8056c90) at dict.c:124 
#1 0xb75cf36b in server_getxattr_cbk () from /usr/lib/glusterfs/1.3.0-pre5/xlator/protocol/server.so 
#2 0xb7f64d55 in default_getxattr_cbk (frame=0x8057228, cookie=0x8057740, this=0x804ffc0, op_ret=0, op_errno=13, dict=0x8056fc8) at defaults.c:1071 
#3 0xb7f6d462 in call_resume (stub=0x8056858) at call-stub.c:2469 
#4 0xb75e1770 in iot_reply () from /usr/lib/glusterfs/1.3.0-pre5/xlator/performance/io-threads.so 
#5 0xb7f3d3db in start_thread () from /lib/libpthread.so.0 
#6 0xb7ec726e in clone () from /lib/libc.so.6 

I hope this helps. Have you been able to reproduce? 

Harris 

----- Original Message ----- 
From: "Anand Avati" < avati at zresearch.com > 
To: "Harris Landgarten" < harrisl at lhjonline.com > 
Cc: "gluster-devel" < gluster-devel at nongnu.org > 
Sent: Wednesday, June 27, 2007 8:09:13 AM (GMT-0500) America/New_York 
Subject: Re: [Gluster-devel] brick crash/hang with io-threads in 2.5 patch 240 

is there a bactrace of the server available too? it would be of great help.. 

thanks, 
avati 

2007/6/27 , Harris Landgarten < harrisl at lhjonline.com >: 

Whenever I enable io-threads in one of my bricks I can cause a crash 

in client1: 

ls -lR /mnt/glusterfs 

while this is running 

in client2: 

ls -l /mnt/glusterfs 
ls: /mnt/glusterfs/secondary: Transport endpoint is not connected 
total 4 
?--------- ? ? ? ? ? /mnt/glusterfs/backups 
?--------- ? ? ? ? ? /mnt/glusterfs/tmp 

At this point the brick with io-threads has crashed: 

2007-06-27 07:45:55 C [common-utils.c:205:gf_print_trace] debug-backtrace: Got signal (11), printing backtrace 
2007-06-27 07:45:55 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(gf_print_trace+0x2d) [0xb7fabd4d] 
2007-06-27 07:45:55 C [common-utils.c:207:gf_print_trace] debug-backtrace: [0xbfffe420] 
2007-06-27 07:45:55 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.0-pre5/xlator/protocol/server.so [0xb761436b] 
2007-06-27 07:45:55 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0 [0xb7fa9d55] 
2007-06-27 07:45:55 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(call_resume+0x4f2) [0xb7fb2462] 
2007-06-27 07:45:55 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.0-pre5/xlator/performance/io- threads.so [0xb7626770] 
2007-06-27 07:45:55 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/libpthread.so.0 [0xb7f823db] 
2007-06-27 07:45:55 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/libc.so.6(clone+0x5e) [0xb7f0c26 

The bricks is running on fedora and it doesn't want to generate a core. Any suggestions? 

This is the spec file I used for the test 

### Export volume "brick" with the contents of "/home/export" directory. 
volume posix1 
type storage/posix # POSIX FS translator 
option directory /mnt/export # Export this directory 
end-volume 

volume io-threads 
type performance/io-threads 
option thread-count 8 
subvolumes posix1 
end-volume 

### Add POSIX record locking support to the storage brick 
volume brick 
type features/posix-locks 
option mandatory on # enables mandatory locking on all files 
subvolumes io-threads 
end-volume 

### Add network serving capability to above brick. 
volume server 
type protocol/server 
option transport-type tcp/server # For TCP/IP transport 
# option transport-type ib-sdp/server # For Infiniband transport 
# option bind-address 192.168.1.10 # Default is to listen on all interfaces 
option listen-port 6996 # Default is 6996 
# option client-volume-filename /etc/glusterfs/glusterfs- client.vol 
subvolumes brick 
# NOTE: Access to any volume through protocol/server is denied by 
# default. You need to explicitly grant access through "auth" option. 
option auth.ip.brick.allow * # access to "brick" volume 
end-volume 

_______________________________________________ 
Gluster-devel mailing list 
Gluster-devel at nongnu.org 
http://lists.nongnu.org/mailman/listinfo/gluster-devel 

-- 
Anand V. Avati 

_______________________________________________ 
Gluster-devel mailing list 
Gluster-devel at nongnu.org 
http://lists.nongnu.org/mailman/listinfo/gluster-devel