[Bugs] [Bug 1598345] gluster get-state command is crashing glusterd process when geo-replication is configured

bugzilla at redhat.com bugzilla at redhat.com
Thu Jul 5 07:50:04 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1598345



--- Comment #1 from Sanju <srakonde at redhat.com> ---
Description of problem:
  If I try to import Gluster cluster with configured geo-replication into 
  RHGS WA, glusterd (on the master side of geo-replication) immediately
  crashes.
How reproducible:
  100%


Steps to Reproduce:
1. Prepare two Gluster clusters with 6 storage nodes peer one cluster
    (usm1* and usm2* clusters in my case)
2. Create one Distributed-Replicated volume on each cluster.
    (named volume_alpha_distrep_6x2 in my case, see gdeploy config [1])
3. Configure geo-replication between the volumes.
    I've used following gdeploy config:
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
  # cat geo-replication.conf
    [hosts]
    usm1-gl1.example.com

    [geo-replication]
    action=create
    mastervol=usm1-gl1.example.com:volume_alpha_distrep_6x2
    slavevol=usm2-gl1.example.com:volume_alpha_distrep_6x2
   
slavenodes=usm2-gl1.example.com,usm2-gl2.example.com,usm2-gl3.example.com,usm2-gl4.example.com,usm2-gl5.example.com,usm2-gl6.example.com
    force=yes

  # gdeploy -c geo-replication.conf 
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  

4. Install and configure RHGS WA (aka Tendrl) Server.
5. Start Import process for the first gluster cluster into RHGS WA.
6. Check glusterd state on the storage servers.

Actual results:
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
  # systemctl status glusterd
    ● glusterd.service - GlusterFS, a clustered file-system server
       Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled;
vendor preset: disabled)
       Active: failed (Result: signal) since Wed 2018-05-16 04:26:06 EDT; 35min
ago
     Main PID: 12035 (code=killed, signal=ABRT)
       CGroup: /system.slice/glusterd.service
               ├─15243 /usr/sbin/glusterfsd -s usm1-gl1.example.com
--volfile-id volume_alpha_distrep_6x2.usm1-gl1.u...
               ├─18072 /usr/sbin/glusterfs -s localhost --volfile-id
gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l
/var/log/glu...
               ├─18280 /usr/bin/python
/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/mnt/brick_alpha_distrep_1/1 --path=/mnt/brick_alpha...
               ├─18341 python
/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/mnt/brick_alpha_distrep_1/1 --path=/mnt/brick_alpha_distrep_...
               ├─18342 python
/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/mnt/brick_alpha_distrep_1/1 --path=/mnt/brick_alpha_distrep_...
               ├─18345 python
/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/mnt/brick_alpha_distrep_1/1 --path=/mnt/brick_alpha_distrep_...
               ├─18346 python
/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/mnt/brick_alpha_distrep_1/1 --path=/mnt/brick_alpha_distrep_...
               ├─18359 ssh -oPasswordAuthentication=no
-oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p
22 -oControlMast...
               ├─18381 ssh -oPasswordAuthentication=no
-oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p
22 -oControlMast...
               ├─18395 /usr/sbin/glusterfs --aux-gfid-mount --acl
--log-file=/var/log/glusterfs/geo-replication/volume_alpha_distrep_6x2/ssh%3A%2F%2F...
               └─18396 /usr/sbin/glusterfs --aux-gfid-mount --acl
--log-file=/var/log/glusterfs/geo-replication/volume_alpha_distrep_6x2/ssh%3A%2F%2F...

    May 16 04:26:06 usm1-gl1.example.com glusterd[12035]: setfsid 1
    May 16 04:26:06 usm1-gl1.example.com glusterd[12035]: spinlock 1
    May 16 04:26:06 usm1-gl1.example.com glusterd[12035]: epoll.h 1
    May 16 04:26:06 usm1-gl1.example.com glusterd[12035]: xattr.h 1
    May 16 04:26:06 usm1-gl1.example.com glusterd[12035]: st_atim.tv_nsec 1
    May 16 04:26:06 usm1-gl1.example.com glusterd[12035]: package-string:
glusterfs 3.12.2
    May 16 04:26:06 usm1-gl1.example.com glusterd[12035]: ---------
    May 16 04:26:06 usm1-gl1.example.com systemd[1]: glusterd.service: main
process exited, code=killed, status=6/ABRT
    May 16 04:26:06 usm1-gl1.example.com systemd[1]: Unit glusterd.service
entered failed state.
    May 16 04:26:06 usm1-gl1.example.com systemd[1]: glusterd.service failed.

  # file /core.*
    /core.12035: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from '/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO', real uid:
0, effective uid: 0, real gid: 0, effective gid: 0, execfn:
'/usr/sbin/glusterd', platform: 'x86_64'

  # gdb /usr/sbin/glusterfsd /core.12035 
    GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
    Copyright (C) 2013 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu".
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>...
    Reading symbols from /usr/sbin/glusterfsd...Reading symbols from
/usr/lib/debug/usr/sbin/glusterfsd.debug...done.
    done.

    warning: core file may not match specified executable file.
    [New LWP 12039]
    [New LWP 12037]
    [New LWP 12038]
    [New LWP 12216]
    [New LWP 12217]
    [New LWP 12036]
    [New LWP 12040]
    [New LWP 12035]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib64/libthread_db.so.1".
    Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid
--log-level INFO'.
    Program terminated with signal 6, Aborted.
    #0  0x00007fc3d425c207 in __GI_raise (sig=sig at entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
    56      return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
    Missing separate debuginfos, use: debuginfo-install
bzip2-libs-1.0.6-13.el7.x86_64 device-mapper-event-libs-1.02.146-4.el7.x86_64
device-mapper-libs-1.02.146-4.el7.x86_64 elfutils-libelf-0.170-4.el7.x86_64
elfutils-libs-0.170-4.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
krb5-libs-1.15.1-19.el7.x86_64 libattr-2.4.46-13.el7.x86_64
libcap-2.22-9.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64
libgcc-4.8.5-28.el7_5.1.x86_64 libselinux-2.5-12.el7.x86_64
libsepol-2.5-8.1.el7.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64
lvm2-libs-2.02.177-4.el7.x86_64 pcre-8.32-17.el7.x86_64
systemd-libs-219-57.el7.x86_64 userspace-rcu-0.7.9-2.el7rhgs.x86_64
xz-libs-5.2.2-1.el7.x86_64
    (gdb) t a a bt

    Thread 8 (Thread 0x7fc3d60e3780 (LWP 12035)):
    #0  0x00007fc3d4a5cf47 in pthread_join (threadid=140478813824768,
thread_return=thread_return at entry=0x0) at pthread_join.c:92
    #1  0x00007fc3d5c5bb38 in event_dispatch_epoll (event_pool=0x563d5085fa30)
at event-epoll.c:746
    #2  0x0000563d5023d2a7 in main (argc=5, argv=<optimized out>) at
glusterfsd.c:2550

    Thread 7 (Thread 0x7fc3cb1f8700 (LWP 12040)):
    #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
    #1  0x00007fc3d5c38e88 in syncenv_task (proc=proc at entry=0x563d50867e50) at
syncop.c:603
    #2  0x00007fc3d5c39d50 in syncenv_processor (thdata=0x563d50867e50) at
syncop.c:695
    #3  0x00007fc3d4a5bdd5 in start_thread (arg=0x7fc3cb1f8700) at
pthread_create.c:308
    #4  0x00007fc3d4324b3d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

    Thread 6 (Thread 0x7fc3cd1fc700 (LWP 12036)):
    #0  0x00007fc3d4a62eed in nanosleep () at
../sysdeps/unix/syscall-template.S:81
    #1  0x00007fc3d5c0b9f6 in gf_timer_proc (data=0x563d50867270) at
timer.c:174
    #2  0x00007fc3d4a5bdd5 in start_thread (arg=0x7fc3cd1fc700) at
pthread_create.c:308
    #3  0x00007fc3d4324b3d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

    Thread 5 (Thread 0x7fc3c5cbe700 (LWP 12217)):
    #0  pthread_cond_wait@@GLIBC_2.3.2 () at
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
    #1  0x00007fc3d5c369b3 in __synclock_lock (lock=lock at entry=0x7fc3d5f7b838)
at syncop.c:935
    #2  0x00007fc3d5c3a066 in synclock_lock (lock=lock at entry=0x7fc3d5f7b838) at
syncop.c:961
    #3  0x00007fc3ca686a4b in glusterd_big_locked_notify (rpc=0x7fc3c0004910,
mydata=0x7fc3c0003810, event=RPC_CLNT_DISCONNECT, data=0x0,
notify_fn=0x7fc3ca690830 <__glusterd_peer_rpc_notify>) at glusterd-handler.c:69
    #4  0x00007fc3d59c542b in rpc_clnt_handle_disconnect (conn=0x7fc3c0004940,
clnt=0x7fc3c0004910) at rpc-clnt.c:876
    #5  rpc_clnt_notify (trans=<optimized out>, mydata=0x7fc3c0004940,
event=<optimized out>, data=0x7fc3c0004b40) at rpc-clnt.c:939
    #6  0x00007fc3d59c1393 in rpc_transport_notify
(this=this at entry=0x7fc3c0004b40, event=event at entry=RPC_TRANSPORT_DISCONNECT,
data=data at entry=0x7fc3c0004b40) at rpc-transport.c:538
    #7  0x00007fc3c78d2bdf in socket_event_poll_err (idx=<optimized out>,
gen=<optimized out>, this=0x7fc3c0004b40) at socket.c:1206
    #8  socket_event_handler (fd=8, idx=<optimized out>, gen=<optimized out>,
data=0x7fc3c0004b40, poll_in=<optimized out>, poll_out=0, poll_err=0) at
socket.c:2476
    #9  0x00007fc3d5c5b504 in event_dispatch_epoll_handler
(event=0x7fc3c5cbde80, event_pool=0x563d5085fa30) at event-epoll.c:583
    #10 event_dispatch_epoll_worker (data=0x563d508bb3f0) at event-epoll.c:659
    #11 0x00007fc3d4a5bdd5 in start_thread (arg=0x7fc3c5cbe700) at
pthread_create.c:308
    #12 0x00007fc3d4324b3d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

    Thread 4 (Thread 0x7fc3c64bf700 (LWP 12216)):
    #0  pthread_cond_wait@@GLIBC_2.3.2 () at
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
    #1  0x00007fc3ca74604b in hooks_worker (args=<optimized out>) at
glusterd-hooks.c:529
    #2  0x00007fc3d4a5bdd5 in start_thread (arg=0x7fc3c64bf700) at
pthread_create.c:308
    #3  0x00007fc3d4324b3d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

    Thread 3 (Thread 0x7fc3cc1fa700 (LWP 12038)):
    #0  0x00007fc3d42eb4fd in nanosleep () at
../sysdeps/unix/syscall-template.S:81
    #1  0x00007fc3d42eb394 in __sleep (seconds=0, seconds at entry=30) at
../sysdeps/unix/sysv/linux/sleep.c:137
    #2  0x00007fc3d5c2622d in pool_sweeper (arg=<optimized out>) at
mem-pool.c:481
    #3  0x00007fc3d4a5bdd5 in start_thread (arg=0x7fc3cc1fa700) at
pthread_create.c:308
    #4  0x00007fc3d4324b3d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

    Thread 2 (Thread 0x7fc3cc9fb700 (LWP 12037)):
    #0  0x00007fc3d4a63411 in do_sigwait (sig=0x7fc3cc9fae1c, set=<optimized
out>) at ../sysdeps/unix/sysv/linux/sigwait.c:61
    #1  __sigwait (set=set at entry=0x7fc3cc9fae20, sig=sig at entry=0x7fc3cc9fae1c)
at ../sysdeps/unix/sysv/linux/sigwait.c:99
    #2  0x0000563d5024058b in glusterfs_sigwaiter (arg=<optimized out>) at
glusterfsd.c:2137
    #3  0x00007fc3d4a5bdd5 in start_thread (arg=0x7fc3cc9fb700) at
pthread_create.c:308
    #4  0x00007fc3d4324b3d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

    ---Type <return> to continue, or q <return> to quit---
    Thread 1 (Thread 0x7fc3cb9f9700 (LWP 12039)):
    #0  0x00007fc3d425c207 in __GI_raise (sig=sig at entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
    #1  0x00007fc3d425d8f8 in __GI_abort () at abort.c:90
    #2  0x00007fc3d429ecc7 in __libc_message (do_abort=do_abort at entry=2,
fmt=fmt at entry=0x7fc3d43b0cf8 "*** Error in `%s': %s: 0x%s ***\n") at
../sysdeps/unix/sysv/linux/libc_fatal.c:196
    #3  0x00007fc3d42a7429 in malloc_printerr (ar_ptr=0x7fc3d45ec760
<main_arena>, ptr=<optimized out>, str=0x7fc3d43b0e00 "double free or
corruption (out)", action=3) at malloc.c:5025
    #4  _int_free (av=0x7fc3d45ec760 <main_arena>, p=<optimized out>,
have_lock=0) at malloc.c:3847
    #5  0x00007fc3d5bf47dd in data_destroy (data=<optimized out>) at dict.c:227
    #6  0x00007fc3d5bf5220 in dict_destroy (this=<optimized out>) at dict.c:589
    #7  0x00007fc3d5bf54fc in dict_unref (this=<optimized out>) at dict.c:648
    #8  0x00007fc3ca6849a6 in glusterd_print_gsync_status_by_vol
(volinfo=<optimized out>, fp=0x7fc3c0013300) at glusterd-handler.c:5188
    #9  glusterd_get_state (dict=0x7fc3c000ab00, req=0x7fc3bc0018e0) at
glusterd-handler.c:5883
    #10 __glusterd_handle_get_state (req=req at entry=0x7fc3bc0018e0) at
glusterd-handler.c:5997
    #11 0x00007fc3ca686abe in glusterd_big_locked_handler (req=0x7fc3bc0018e0,
actor_fn=0x7fc3ca683180 <__glusterd_handle_get_state>) at glusterd-handler.c:82
    #12 0x00007fc3d5c368b0 in synctask_wrap () at syncop.c:375
    #13 0x00007fc3d426dfc0 in ?? () from /lib64/libc.so.6
    #14 0x0000000000000000 in ?? ()
    (gdb) 
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  

Expected results:
  glusterd shouldn't crash

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the Bugs mailing list