[Gluster-users] Segfault in gluster volume heal

Fri Jul 13 19:01:40 UTC 2012

Just thought I'd report it here: this is 3.3.0 under Ubuntu 12.04.

root at dev-storage1:~# gluster volume heal safe 
Heal operation on volume safe has been successful
root at dev-storage1:~# gluster volume heal safe full
Heal operation on volume safe has been successful
root at dev-storage1:~# gluster volume heal safe info healed
Heal operation on volume safe has been successful

Brick dev-storage1:/disk/storage1/safe
Number of entries: 0

Brick dev-storage2:/disk/storage2/safe
Number of entries: 1
Segmentation fault (core dumped)
root at dev-storage1:~# 

Oops. Under gdb:

root at dev-storage1:~# gdb gluster
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /usr/sbin/gluster...(no debugging symbols found)...done.
(gdb) run volume heal safe info healed
Starting program: /usr/sbin/gluster volume heal safe info healed
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff55ee700 (LWP 7009)]
[New Thread 0x7ffff4ded700 (LWP 7011)]
Heal operation on volume safe has been successful

Brick dev-storage1:/disk/storage1/safe
Number of entries: 0

Brick dev-storage2:/disk/storage2/safe
Number of entries: 8

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6fcf0d0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff6fcf0d0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff6fd0f86 in strftime_l () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00000000004206a6 in cmd_heal_volume_brick_out ()
#3  0x0000000000420a6f in gf_cli3_1_heal_volume_cbk ()
#4  0x00007ffff7502b85 in rpc_clnt_handle_reply () from /usr/lib/libgfrpc.so.0
#5  0x00007ffff7503585 in rpc_clnt_notify () from /usr/lib/libgfrpc.so.0
#6  0x00007ffff74ff577 in rpc_transport_notify () from /usr/lib/libgfrpc.so.0
#7  0x00007ffff58076a4 in socket_event_poll_in ()
   from /usr/lib/glusterfs/3.3.0/rpc-transport/socket.so
#8  0x00007ffff58079f7 in socket_event_handler ()
   from /usr/lib/glusterfs/3.3.0/rpc-transport/socket.so
#9  0x00007ffff7b9dd67 in ?? () from /usr/lib/libglusterfs.so.0
#10 0x00000000004076d6 in main ()
(gdb) info threads
  Id   Target Id         Frame 
  3    Thread 0x7ffff4ded700 (LWP 7011) "gluster" 0x00007ffff72e30fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  2    Thread 0x7ffff55ee700 (LWP 7009) "gluster" 0x00007ffff72e652d in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0
* 1    Thread 0x7ffff7fed700 (LWP 7006) "gluster" 0x00007ffff6fcf0d0 in ?? ()
   from /lib/x86_64-linux-gnu/libc.so.6
(gdb) 

Anything else useful I can do to pin this down?

The other 'heal' suboptions don't segfault:

root at dev-storage1:~# gluster volume heal safe info heal-failed
Heal operation on volume safe has been successful

Brick dev-storage1:/disk/storage1/safe
Number of entries: 0

Brick dev-storage2:/disk/storage2/safe
Number of entries: 0
root at dev-storage1:~# gluster volume heal safe info split-brain
Heal operation on volume safe has been successful

Brick dev-storage1:/disk/storage1/safe
Number of entries: 0

Brick dev-storage2:/disk/storage2/safe
Number of entries: 0
root at dev-storage1:~# 

And for what it's worth, the replicas *have* synchronised properly. It's
just the gluster volume output which crashes.

Regards,

Brian.