[Gluster-devel] [Gluster-users]glusterfs crashed lead by liblvm2app.so with BD xlator

Jaden Liang jaden1q84 at gmail.com
Sat Nov 8 10:20:49 UTC 2014


Hi all,

We are testing BD xlator to verify the KVM running with gluster. After some
simple tests, we encountered a coredump of glusterfs lead by liblvm2app.so.
Hope some one here might give some advises about this issue.

We have debug for some time, and found out this coredump is triggered by a
thread-safe issue. From the core file, the top function is _update_mda()
with a invailid pointer which is from lvmcache_foreach_mda(). As we know,
the glusterfsd
has some io threads to simulate the async io. That will make more than 1
thread run into
bd_statfs_cbk(). And in liblvm2app.so, _text_read() will look up an info in
a hash
table named _pvid_hash. If no info item exist, it will allocate a new one.
However,
there isn't any lock to protect this operations! liblvm2app.so will get
crashed with
multi-thread like this precedures:

Thread A and thread B go into bd_statfs_cbk() at the same time:
1. A allocate an new info node, and put it into _pvid_hash, call
lvmcache_foreach_mda().
2. B looks up and get the info generaed by A in _pvid_hash, pass it to
lvmcache_del_mdas(), this will free the info node.
3. A keep using the info node which has been freed by B.
4. Memory crash...

Reproduce steps:
1. Create a BD volume with BD xlator follow a standard method. Mount it on
a glusterfs client.

2. Write a simple test script crash_bd.sh:
#!/bin/bash
while :;do
    i=0
    while [ $i -lt 10 ]; do
        df > /dev/null
        i=`expr $i + 1`
    done
    sleep 10;
done

3. Start some crash_bd.sh at the same time
# ./crash_bd.sh &
# ./crash_bd.sh &
# ./crash_bd.sh &
# ./crash_bd.sh &

4. Just wait some minutes, it will get an error like this:
df: `/mnt/bd_vol': Transport endpoint is not connected
the glusterfs has crashed.

Note: If we set the io-thread number to single, the BD xlator looks running
very well!

Hope some one here might give some advises about this issue. Any
infomations is appriciated!

Core detail:
Core was generated by `/usr/sbin/glusterfsd -s host-005056b50a23
--volfile-id bd.host-0050'.
Program terminated with signal 11, Segmentation fault.
#0  _update_mda (mda=0x11fb000, baton=0x7f83b1d0a6d0) at
format_text/text_label.c:328
353    format_text/text_label.c: No such file or directory.
(gdb) bt
#0  _update_mda (mda=0x11fb000, baton=0x7f83b1d0a6d0) at
format_text/text_label.c:328
#1  0x00007f83b59a0e09 in lvmcache_foreach_mda (info=info at entry=0x11faf00,
fun=fun at entry=0x7f83b59c1a60 <_update_mda>,
    baton=baton at entry=0x7f83b1d0a6d0) at cache/lvmcache.c:1880
#2  0x00007f83b59c0e5f in _text_read (l=<optimized out>, dev=0x11ee8d8,
buf=<optimized out>, label=0x7f83b1d0a958)
    at format_text/text_label.c:459
#3  0x00007f83b59c27e7 in label_read (dev=0x11ee8d8,
result=result at entry=0x7f83b1d0a958,
scan_sector=scan_sector at entry=0)
    at label/label.c:284
#4  0x00007f83b599dd2b in lvmcache_fmt_from_vgname (cmd=cmd at entry=0x11d3c40,
vgname=vgname at entry=0x11d2c50 "bd-vg",
    vgid=vgid at entry=0x0, revalidate_labels=revalidate_labels at entry=1) at
cache/lvmcache.c:506
#5  0x00007f83b59e1ad8 in _vg_read (cmd=cmd at entry=0x11d3c40,
vgname=vgname at entry=0x11d2c50 "bd-vg", vgid=vgid at entry=0x0,
    warnings=warnings at entry=1, consistent=consistent at entry=0x7f83b1d0ab48,
precommitted=precommitted at entry=0)
    at metadata/metadata.c:3143
#6  0x00007f83b59e2ecc in vg_read_internal (cmd=cmd at entry=0x11d3c40,
vgname=vgname at entry=0x11d2c50 "bd-vg",
    vgid=vgid at entry=0x0, warnings=warnings at entry=1,
consistent=consistent at entry=0x7f83b1d0ab48) at metadata/metadata.c:3549
#7  0x00007f83b59e30cc in _vg_lock_and_read (misc_flags=0, status_flags=0,
lock_flags=33, vgid=0x0,
    vg_name=0x11d2c50 "bd-vg", cmd=0x11d3c40) at metadata/metadata.c:4235
#8  vg_read (cmd=cmd at entry=0x11d3c40, vg_name=vg_name at entry=0x11d2c50
"bd-vg", vgid=vgid at entry=0x0, flags=0)
    at metadata/metadata.c:4343
#9  0x00007f83b599753f in _lvm_vg_open (mode=0x7f83b5c8971e "r",
vgname=0x11d2c50 "bd-vg", libh=0x11d3c40,
    flags=<optimized out>) at lvm_vg.c:221
#10 lvm_vg_open (libh=0x11d3c40, vgname=0x11d2c50 "bd-vg",
mode=mode at entry=0x7f83b5c8971e
"r", flags=flags at entry=0)
    at lvm_vg.c:238
#11 0x00007f83b5c7ee36 in bd_statfs_cbk (frame=0x7f83b95416e4,
cookie=<optimized out>, this=0x119eb90, op_ret=0, op_errno=0,
    buff=0x7f83b1d0ac70, xdata=0x0) at bd.c:353
......

(gdb) f 1
#1  0x00007f83b59a0e09 in lvmcache_foreach_mda (info=info at entry=0x11faf00,
fun=fun at entry=0x7f83b59c1a60 <_update_mda>,
    baton=baton at entry=0x7f83b1d0a6d0) at cache/lvmcache.c:1899
1899    cache/lvmcache.c: No such file or directory.
(gdb) p *info
$1 = {list = {n = 0x11fa650, p = 0x11fa650}, mdas = {n =0x11fafd0, p =
0x11fafd0}, das = {n = 0x11fb000, p = 0x11fb000},
  bas = {n = 0x11faf30, p = 0x11faf30}, vginfo = 0x11fa640, label =
0x11faed0, fmt = 0x11f8480, dev = 0x11ee8d8,
  device_size = 531870253056, status = 1}
(gdb) info threads
  Id   Target Id         Frame
  11   Thread 0x7f83b3786700 (LWP 24272) 0x00007f7bc917cdec in _dev_close
(dev=0x16421d0, immediate=immediate at entry=0) at device/dev-io.c:624
  10   Thread 0x7f83bb968700 (LWP 23306) 0x00007f83ba5e40d3 in epoll_wait
() from /lib/x86_64-linux-gnu/libc.so.6
  9    Thread 0x7f83b250c700 (LWP 24276) 0x00007f83bac822d4 in
pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
  8    Thread 0x7f83b2d0d700 (LWP 24275) 0x00007f83bac8264b in
pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
  7    Thread 0x7f83b350e700 (LWP 24274) 0x00007f83ba5b4bdd in nanosleep ()
from /lib/x86_64-linux-gnu/libc.so.6
  6    Thread 0x7f83b3887700 (LWP 24271) 0x00007f83bac822d4 in
pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
  5    Thread 0x7f83b6ed7700 (LWP 23310) 0x00007f83bac858ad in nanosleep ()
from /lib/x86_64-linux-gnu/libpthread.so.0
  4    Thread 0x7f83b7d57700 (LWP 23309) 0x00007f83bac8264b in
pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
  3    Thread 0x7f83b8558700 (LWP 23308) 0x00007f83bac8264b in
pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
  2    Thread 0x7f83b8d59700 (LWP 23307) 0x00007f83bac85d77 in do_sigwait
() from /lib/x86_64-linux-gnu/libpthread.so.0
* 1    Thread 0x7f83b1d0b700 (LWP 26000) _update_mda (mda=0x11fb000,
baton=0x7f83b1d0a6d0) at format_text/text_label.c:353
(gdb)thread 11
[Switching to thread 11 (Thread 0x7f83b3786700 (LWP 24272))]
#0  0x00007f83bac8578d in fsync () from
/lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007f7bc917cdec in _dev_close (dev=0x16421d0,
immediate=immediate at entry=0) at device/dev-io.c:624
#1  0x00007f7bc917d257 in dev_close (dev=<optimized out>) at
device/dev-io.c:631
#2  0x00007f83b59c1a88 in _update_mda (mda=0x11fafd0, baton=0x7f83b37856d0)
at at format_text/text_label.c:361
#3  0x00007f83b59a0e09 in lvmcache_foreach_mda (info=info at entry=0x11faf00,
fun=fun at entry=0x7f83b59c1a60 <_update_mda>,
    baton=baton at entry=0x7f83b37856d0) at cache/lvmcache.c:1899
#4  0x00007f83b59c0e5f in _text_read (l=<optimized out>, dev=0x11ee8d8,
buf=<optimized out>, label=0x7f83b3785958)
    at format_text/text_label.c:459
#5  0x00007f83b59c27e7 in label_read (dev=0x11ee8d8,
result=result at entry=0x7f83b3785958,
scan_sector=scan_sector at entry=0)
    at label/label.c:284
#6  0x00007f83b599dd2b in lvmcache_fmt_from_vgname (cmd=cmd at entry=0x11d3c40,
vgname=vgname at entry=0x11d2c50 "bd-vg",
    vgid=vgid at entry=0x0, revalidate_labels=revalidate_labels at entry=1) at
cache/lvmcache.c:506
#7  0x00007f83b59e1ad8 in _vg_read (cmd=cmd at entry=0x11d3c40,
vgname=vgname at entry=0x11d2c50 "bd-vg", vgid=vgid at entry=0x0,
    warnings=warnings at entry=1, consistent=consistent at entry=0x7f83b3785b48,
precommitted=precommitted at entry=0)
    at metadata/metadata.c:3143
#8  0x00007f83b59e2ecc in vg_read_internal (cmd=cmd at entry=0x11d3c40,
vgname=vgname at entry=0x11d2c50 "bd-vg",
    vgid=vgid at entry=0x0, warnings=warnings at entry=1,
consistent=consistent at entry=0x7f83b3785b48) at metadata/metadata.c:3549
#9  0x00007f83b59e30cc in _vg_lock_and_read (misc_flags=0, status_flags=0,
lock_flags=33, vgid=0x0,
    vg_name=0x11d2c50 "bd-vg", cmd=0x11d3c40) at metadata/metadata.c:4235
#10 vg_read (cmd=cmd at entry=0x11d3c40, vg_name=vg_name at entry=0x11d2c50
"bd-vg", vgid=vgid at entry=0x0, flags=0)
    at metadata/metadata.c:4343
#11 0x00007f83b599753f in _lvm_vg_open (mode=0x7f83b5c8971e "r",
vgname=0x11d2c50 "bd-vg", libh=0x11d3c40,
    flags=<optimized out>) at lvm_vg.c:221
#12 lvm_vg_open (libh=0x11d3c40, vgname=0x11d2c50 "bd-vg",
mode=mode at entry=0x7f83b5c8971e
"r", flags=flags at entry=0)
    at lvm_vg.c:238
#13 0x00007f83b5c7ee36 in bd_statfs_cbk (frame=0x7f83b95412dc,
cookie=<optimized out>, this=0x119eb90, op_ret=0, op_errno=0,
    buff=0x7f83b3785c70, xdata=0x0) at bd.c:353
......
(gdb) f 3
#3  0x00007f83b59a0e09 in lvmcache_foreach_mda (info=info at entry=0x11faf00,
fun=fun at entry=0x7f83b59c1a60 <_update_mda>,
    baton=baton at entry=0x7f83b37856d0) at cache/lvmcache.c:1899
1899    in cache/lvmcache.c
(gdb) p *info
$2 = {list = {n = 0x11fa650, p = 0x11fa650}, mdas = {n = 0x11fafd0, p =
0x11fafd0}, das = {n = 0x11fb000, p = 0x11fb000},
  bas = {n = 0x11faf30, p = 0x11faf30}, vginfo = 0x11fa640, label =
0x11faed0, fmt = 0x11f8480, dev = 0x11ee8d8,
  device_size = 531870253056, status = 1}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20141108/804f6cdb/attachment-0001.html>


More information about the Gluster-devel mailing list