[Bugs] [Bug 1736848] New: Execute the "gluster peer probe invalid_hostname" thread deadlock or the glusterd process crashes

bugzilla at redhat.com bugzilla at redhat.com
Fri Aug 2 07:56:17 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1736848

            Bug ID: 1736848
           Summary: Execute the "gluster peer probe invalid_hostname"
                    thread deadlock or the glusterd process crashes
           Product: GlusterFS
           Version: 6
          Hardware: x86_64
                OS: Linux
            Status: NEW
         Component: glusterd
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: xlfy555 at 163.com
                CC: bugs at gluster.org
  Target Milestone: ---
    Classification: Community



Description of problem:
When glusterd starts, typing the command "gluster peer probe invalid_hostname"
produces different results on different machines, with some machines glusterd
crashing and producing core files, and some machines glusterd processes with
many more child threads.

Version-Release number of selected component (if applicable):
release-6

How reproducible:


Steps to Reproduce:
Case 1
1.glusterd
2.gluster peer probe invalid_hostname

Case 2
1.glusterd
2.gluster peer probe invalid_hostname
3.gluster peer probe invalid_hostname
4.gluster peer probe invalid_hostname(Do it a few more times)
5.ps -aux|grep glusterd
6.gdb attach glusterd-pid
7.info thr (You'll see a lot of "__lll_lock_wait()" child threads)

Actual results:
Case 1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Core was generated by `glusterd'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fef4bd208ff in rpc_clnt_handle_disconnect (conn=0x7fef34007890,
clnt=0x7fef34007860) at rpc-clnt.c:832
832             if (!conn->rpc_clnt->disabled && (conn->reconnect == NULL)) {
Missing separate debuginfos, use: debuginfo-install
bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.166-2.el7.x86_64
elfutils-libs-0.166-2.el7.x86_64 glibc-2.17-157.el7.x86_64
keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64
libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64
libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-11.el7.x86_64
libselinux-2.5-6.el7.x86_64 libuuid-2.23.2-33.el7.x86_64
libxml2-2.9.1-6.el7_2.3.x86_64 openssl-libs-1.0.1e-60.el7.x86_64
pcre-8.32-15.el7_2.1.x86_64 systemd-libs-219-30.el7.x86_64
userspace-rcu-0.7.16-1.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64
zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  0x00007fef4bd208ff in rpc_clnt_handle_disconnect (conn=0x7fef34007890,
clnt=0x7fef34007860) at rpc-clnt.c:832
#1  rpc_clnt_notify (trans=0x7fef34007be0, mydata=0x7fef34007890,
event=<optimized out>, data=<optimized out>) at rpc-clnt.c:878
#2  0x00007fef4bd1d4e3 in rpc_transport_notify (this=<optimized out>,
event=event at entry=RPC_TRANSPORT_DISCONNECT, data=<optimized out>) at
rpc-transport.c:542
#3  0x00007fef3f3634d7 in socket_connect_error_cbk (opaque=0x7fef34007190) at
socket.c:3239
#4  0x00007fef4adb6dc5 in start_thread () from /usr/lib64/libpthread.so.0
#5  0x00007fef4a6fb73d in clone () from /usr/lib64/libc.so.6
(gdb) p conn->rpc_clnt
$1 = (struct rpc_clnt *) 0x14860
(gdb) p conn->rpc_clnt->disabled
Cannot access memory at address 0x149a0



Case 2
(gdb) info thr
  Id   Target Id         Frame 
  16   Thread 0x7ff384f45700 (LWP 18259) "glfs_timer" 0x00007ff38c728bdd in
nanosleep () from /usr/lib64/libpthread.so.0
  15   Thread 0x7ff384744700 (LWP 18260) "glfs_sigwait" 0x00007ff38c729101 in
sigwait () from /usr/lib64/libpthread.so.0
  14   Thread 0x7ff383f43700 (LWP 18261) "glfs_memsweep" 0x00007ff38c02d66d in
nanosleep () from /usr/lib64/libc.so.6
  13   Thread 0x7ff383742700 (LWP 18262) "glfs_sproc0" 0x00007ff38c725a82 in
pthread_cond_timedwait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  12   Thread 0x7ff382f41700 (LWP 18263) "glfs_sproc1" 0x00007ff38c725a82 in
pthread_cond_timedwait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  11   Thread 0x7ff382740700 (LWP 18264) "glusterd" 0x00007ff38c05dba3 in
select () from /usr/lib64/libc.so.6
  10   Thread 0x7ff37f2c1700 (LWP 18290) "glfs_gdhooks" 0x00007ff38c7256d5 in
pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
  9    Thread 0x7ff37eac0700 (LWP 18291) "glfs_epoll000" 0x00007ff38c066d13 in
epoll_wait () from /usr/lib64/libc.so.6
  8    Thread 0x7ff37d216700 (LWP 18306) "glfs_scleanup" 0x00007ff38c7281bd in
__lll_lock_wait () from /usr/lib64/libpthread.so.0
  7    Thread 0x7ff37ca15700 (LWP 18307) "glfs_scleanup" 0x00007ff38c060bf9 in
syscall () from /usr/lib64/libc.so.6
  6    Thread 0x7ff367fff700 (LWP 18315) "glfs_scleanup" 0x00007ff38c7281bd in
__lll_lock_wait () from /usr/lib64/libpthread.so.0
  5    Thread 0x7ff3677fe700 (LWP 18323) "glfs_scleanup" 0x00007ff38c7281bd in
__lll_lock_wait () from /usr/lib64/libpthread.so.0
  4    Thread 0x7ff366ffd700 (LWP 18331) "glfs_scleanup" 0x00007ff38c7281bd in
__lll_lock_wait () from /usr/lib64/libpthread.so.0
  3    Thread 0x7ff3667fc700 (LWP 18339) "glfs_scleanup" 0x00007ff38c7281bd in
__lll_lock_wait () from /usr/lib64/libpthread.so.0
  2    Thread 0x7ff365ffb700 (LWP 18347) "glfs_scleanup" 0x00007ff38c7281bd in
__lll_lock_wait () from /usr/lib64/libpthread.so.0
* 1    Thread 0x7ff38de22480 (LWP 18258) "glusterd" 0x00007ff38c722ef7 in
pthread_join () from /usr/lib64/libpthread.so.0

Expected results:


Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list