[Gluster-users] Segfault in gluster volume heal

Sat Jul 14 16:13:22 UTC 2012

Hi,

I just noticed this too while fixing a split-brain during testing.  For me, any if the info arguments (healed, heal-failed, split-brain) crashes.

Gerald

----- Original Message -----
> From: "Brian Candler" <B.Candler at pobox.com>
> To: gluster-users at gluster.org
> Sent: Friday, July 13, 2012 2:01:40 PM
> Subject: [Gluster-users] Segfault in gluster volume heal
> 
> Just thought I'd report it here: this is 3.3.0 under Ubuntu 12.04.
> 
> root at dev-storage1:~# gluster volume heal safe
> Heal operation on volume safe has been successful
> root at dev-storage1:~# gluster volume heal safe full
> Heal operation on volume safe has been successful
> root at dev-storage1:~# gluster volume heal safe info healed
> Heal operation on volume safe has been successful
> 
> Brick dev-storage1:/disk/storage1/safe
> Number of entries: 0
> 
> Brick dev-storage2:/disk/storage2/safe
> Number of entries: 1
> Segmentation fault (core dumped)
> root at dev-storage1:~#
> 
> Oops. Under gdb:
> 
> root at dev-storage1:~# gdb gluster
> GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2) 7.4-2012.04
> Copyright (C) 2012 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show
> copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> For bug reporting instructions, please see:
> <http://bugs.launchpad.net/gdb-linaro/>...
> Reading symbols from /usr/sbin/gluster...(no debugging symbols
> found)...done.
> (gdb) run volume heal safe info healed
> Starting program: /usr/sbin/gluster volume heal safe info healed
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library
> "/lib/x86_64-linux-gnu/libthread_db.so.1".
> [New Thread 0x7ffff55ee700 (LWP 7009)]
> [New Thread 0x7ffff4ded700 (LWP 7011)]
> Heal operation on volume safe has been successful
> 
> Brick dev-storage1:/disk/storage1/safe
> Number of entries: 0
> 
> Brick dev-storage2:/disk/storage2/safe
> Number of entries: 8
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x00007ffff6fcf0d0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> (gdb) bt
> #0  0x00007ffff6fcf0d0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007ffff6fd0f86 in strftime_l () from
> /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00000000004206a6 in cmd_heal_volume_brick_out ()
> #3  0x0000000000420a6f in gf_cli3_1_heal_volume_cbk ()
> #4  0x00007ffff7502b85 in rpc_clnt_handle_reply () from
> /usr/lib/libgfrpc.so.0
> #5  0x00007ffff7503585 in rpc_clnt_notify () from
> /usr/lib/libgfrpc.so.0
> #6  0x00007ffff74ff577 in rpc_transport_notify () from
> /usr/lib/libgfrpc.so.0
> #7  0x00007ffff58076a4 in socket_event_poll_in ()
>    from /usr/lib/glusterfs/3.3.0/rpc-transport/socket.so
> #8  0x00007ffff58079f7 in socket_event_handler ()
>    from /usr/lib/glusterfs/3.3.0/rpc-transport/socket.so
> #9  0x00007ffff7b9dd67 in ?? () from /usr/lib/libglusterfs.so.0
> #10 0x00000000004076d6 in main ()
> (gdb) info threads
>   Id   Target Id         Frame
>   3    Thread 0x7ffff4ded700 (LWP 7011) "gluster" 0x00007ffff72e30fe
>   in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>   /lib/x86_64-linux-gnu/libpthread.so.0
>   2    Thread 0x7ffff55ee700 (LWP 7009) "gluster" 0x00007ffff72e652d
>   in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0
> * 1    Thread 0x7ffff7fed700 (LWP 7006) "gluster" 0x00007ffff6fcf0d0
> in ?? ()
>    from /lib/x86_64-linux-gnu/libc.so.6
> (gdb)
> 
> Anything else useful I can do to pin this down?
> 
> The other 'heal' suboptions don't segfault:
> 
> root at dev-storage1:~# gluster volume heal safe info heal-failed
> Heal operation on volume safe has been successful
> 
> Brick dev-storage1:/disk/storage1/safe
> Number of entries: 0
> 
> Brick dev-storage2:/disk/storage2/safe
> Number of entries: 0
> root at dev-storage1:~# gluster volume heal safe info split-brain
> Heal operation on volume safe has been successful
> 
> Brick dev-storage1:/disk/storage1/safe
> Number of entries: 0
> 
> Brick dev-storage2:/disk/storage2/safe
> Number of entries: 0
> root at dev-storage1:~#
> 
> And for what it's worth, the replicas *have* synchronised properly.
> It's
> just the gluster volume output which crashes.
> 
> Regards,
> 
> Brian.
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>