[Gluster-users] glusterfs 3.3 self-heal daemon crash and can't be started

Thu Mar 14 09:38:36 UTC 2013

I have fix this bug in our local glusterfs 3.3 repo, the root cause is
in glusterfs 3.3
glusterfsd/src/glusterfsd-mgmt.c line 1394
static char oldvolfile[131072];

so if the volume
file(/var/lib/glusterd/glustershd/glustershd-server.vol) is larger
than 128K then it simply crashes. This happens if there're a lot of
volumes on the server and the server volume file is larger than 128k.
on the line 1629
 memcpy (oldvolfile, rsp.spec, size);

It should be a bug.

FYI
Thank you very much.

2013/3/14, Vijay Bellur <vbellur at redhat.com>:
> On 03/14/2013 02:08 PM, 符永涛 wrote:
>> Dear glusterfs experts,
>> Recently we have encountered a self-heal daemon crash issue after
>> rebalanced volume.
>> Crash stack bellow:
>> +------------------------------------------------------------------------------+
>> pending frames:
>>
>> patchset: git://git.gluster.com/glusterfs.git
>> signal received: 11
>> time of crash: 2013-03-14 16:33:50
>> configuration details:
>> argp 1
>> backtrace 1
>> dlfcn 1
>> fdatasync 1
>> libpthread 1
>> llistxattr 1
>> setfsid 1
>> spinlock 1
>> epoll.h 1
>> xattr.h 1
>> st_atim.tv_nsec 1
>> package-string: glusterfs 3.3.0
>> /lib64/libc.so.6[0x38d0a32920]
>> /lib64/libc.so.6(memcpy+0x309)[0x38d0a88da9]
>> /usr/sbin/glusterfs(mgmt_getspec_cbk+0x398)[0x40c888]
>> /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x38d1a0f4d5]
>> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x38d1a0fcd0]
>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x38d1a0aeb8]
>> /usr/lib64/glusterfs/3.3.0/rpc-transport/socket.so(socket_event_poll_in+0x34)[0x7f1d47b8f784]
>> /usr/lib64/glusterfs/3.3.0/rpc-transport/socket.so(socket_event_handler+0xc7)[0x7f1d47b8f867]
>> /usr/lib64/libglusterfs.so.0[0x38d1e3e4a4]
>> /usr/sbin/glusterfs(main+0x58a)[0x40731a]
>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x38d0a1ecdd]
>> /usr/sbin/glusterfs[0x404289]
>> ---------
>>
>> Any none know how to fix it. Currently the self-heal daemon can't be
>> started.
>
> Can you please post details of your volume configuration and glustershd
> logs from the node where the crash is seen?
>
> Thanks,
> Vijay
>
>
>

-- 
符永涛