[Bugs] [Bug 1217589] New: glusterd crashed while schdeuler was creating snapshots when bit rot was enabled on the volumes

bugzilla at redhat.com bugzilla at redhat.com
Thu Apr 30 18:16:57 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1217589

            Bug ID: 1217589
           Summary: glusterd crashed while schdeuler was creating
                    snapshots when bit rot was enabled on the volumes
           Product: GlusterFS
           Version: mainline
         Component: bitrot
          Assignee: bugs at gluster.org
          Reporter: senaik at redhat.com
                CC: bugs at gluster.org
      Docs Contact: bugs at gluster.org



Description of problem:
======================
Enable bit rot on the volumes and scheduled snapshots to be created every 5
mins on the volume - first snapshot creation failed as glusterd crashed


Version-Release number of selected component (if applicable):
=============================================================
gluster --version
glusterfs 3.7.0alpha0 built on Apr 28 2015 01:55:23


How reproducible:
=================
1/1

Steps to Reproduce:
===================
1.Create 3 volumes - 8x2 vol with replica 2 hot tier attached , 12 brick
disperse volume with redundacny 4 , 3 brick distribute vol 

2.Enable USS, quota and bit rot on all volumes 

3.Fuse and NFS mount all volumes 

4.Initialise scheduler on all nodes and enable it 

5. Add 3 jobs to create snapshots on 3 volumes every 5 mins 
[root at rhs-arch-srv4 ~]# snap_scheduler.py list
JOB_NAME         SCHEDULE         OPERATION        VOLUME NAME      
--------------------------------------------------------------------
J1_vol0          */5 * * * *      Snapshot Create  vol0             
J1_vol1          */5 * * * *      Snapshot Create  vol1             
J1_vol2          */5 * * * *      Snapshot Create  vol2 

6.First snapshot create failed 

04-30 23:30:01,156 gcron.py:67 takeSnap] DEBUG Running command 'gluster
snapshot create Scheduled-J1_vol1-vol1 vol1'
[2015-04-30 23:30:01,162 gcron.py:95 doJob] DEBUG
/var/run/gluster/shared_storage/snaps/lock_files/J1_vol2 last modified at Thu
Apr 30 23:25:08 2015
[2015-04-30 23:30:01,162 gcron.py:97 doJob] DEBUG Processing job
Scheduled-J1_vol2-vol2
[2015-04-30 23:30:01,163 gcron.py:67 takeSnap] DEBUG Running command 'gluster
snapshot create Scheduled-J1_vol2-vol2 vol2'
[2015-04-30 23:30:06,827 gcron.py:74 takeSnap] DEBUG Command 'gluster snapshot
create Scheduled-J1_vol1-vol1 vol1' returned '1'
[2015-04-30 23:30:06,830 gcron.py:74 takeSnap] DEBUG Command 'gluster snapshot
create Scheduled-J1_vol2-vol2 vol2' returned '1'
[2015-04-30 23:30:06,832 gcron.py:74 takeSnap] DEBUG Command 'gluster snapshot
create Scheduled-J1_vol0-vol0 vol0' returned '1'
[2015-04-30 23:30:06,830 gcron.py:77 takeSnap] ERROR Snapshot of vol2 failed
[2015-04-30 23:30:06,828 gcron.py:77 takeSnap] ERROR Snapshot of vol1 failed
[2015-04-30 23:30:06,833 gcron.py:77 takeSnap] ERROR Snapshot of vol0 failed
[2015-04-30 23:30:06,838 gcron.py:78 takeSnap] ERROR Command output:
[2015-04-30 23:30:06,838 gcron.py:78 takeSnap] ERROR Command output:
[2015-04-30 23:30:06,838 gcron.py:78 takeSnap] ERROR Command output:
[2015-04-30 23:30:06,839 gcron.py:79 takeSnap] ERROR snapshot create: failed:
quorum is not met

[2015-04-30 23:30:06,839 gcron.py:79 takeSnap] ERROR snapshot create: failed:
One or more bricks may be down.

[2015-04-30 23:30:06,839 gcron.py:79 takeSnap] ERROR snapshot create: failed:
quorum is not met

[2015-04-30 23:30:06,839 gcron.py:101 doJob] ERROR Job Scheduled-J1_vol2-vol2
failed
[2015-04-30 23:30:06,839 gcron.py:101 doJob] ERROR Job Scheduled-J1_vol1-vol1
failed
[2015-04-30 23:30:06,839 gcron.py:101 doJob] ERROR Job Scheduled-J1_vol0-vol0
failed
~

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2015-04-30 17:45:22
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.0alpha0
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x32d3621dc6]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x32d363dadf]
/lib64/libc.so.6[0x33f6c326a0]
/usr/lib64/liburcu-bp.so.1(rcu_read_unlock_bp+0x16)[0x7f07aa97ad16]
/usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_commit+0x1c2)[0x7f07aac78392]
/usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_initiate_snap_phases+0x748)[0x7f07aac7c1b8]
/usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so(glusterd_handle_snapshot_create+0x4c0)[0x7f07aac67b50]
/usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so(glusterd_handle_snapshot_fn+0x821)[0x7f07aac73901]
/usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7f07aabbee5f]
/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x32d3661d12]
/lib64/libc.so.6[0x33f6c438f0]


10.70.34.50:
===========
core.11333

Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id
gluster/bitd -p /var/lib/glusterd'.
Program terminated with signal 7, Bus error.
#0  0x00007fbce0242c54 in gf_changelog_reborp_rpcsvc_notify ()
   from /usr/lib64/libgfchangelog.so.0
Missing separate debuginfos, use: debuginfo-install
glusterfs-3.7.0alpha0-0.17.gited96153.el6.x86_64
(gdb) bt
#0  0x00007fbce0242c54 in gf_changelog_reborp_rpcsvc_notify ()
   from /usr/lib64/libgfchangelog.so.0
#1  0x00000032d3a09e64 in rpcsvc_notify () from /usr/lib64/libgfrpc.so.0
#2  0x00000032d3a0b7b8 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
#3  0x00007fbce14bb632 in ?? ()
   from /usr/lib64/glusterfs/3.7.0alpha0/rpc-transport/socket.so
#4  0x00000032d367d060 in ?? () from /usr/lib64/libglusterfs.so.0
#5  0x00000033f70079d1 in start_thread () from /lib64/libpthread.so.0
#6  0x00000033f6ce89dd in clone () from /lib64/libc.so.6

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
core.23765 - tracked by BZ 1211640

(gdb) bt
#0  0x00007f07aa97ad16 in rcu_read_unlock_bp () from /usr/lib64/liburcu-bp.so.1
#1  0x00007f07aac78392 in glusterd_mgmt_v3_commit ()
   from /usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so
#2  0x00007f07aac7c1b8 in glusterd_mgmt_v3_initiate_snap_phases ()
   from /usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so
#3  0x00007f07aac67b50 in glusterd_handle_snapshot_create ()
   from /usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so
#4  0x00007f07aac73901 in glusterd_handle_snapshot_fn ()
   from /usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so
#5  0x00007f07aabbee5f in glusterd_big_locked_handler ()
   from /usr/lib64/glusterfs/3.7.0alpha0/xlator/mgmt/glusterd.so
#6  0x00000032d3661d12 in synctask_wrap () from /usr/lib64/libglusterfs.so.0
#7  0x00000033f6c438f0 in ?? () from /lib64/libc.so.6
#8  0x0000000000000000 in ?? ()


10.70.36.2 :
===========
core.19223

Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id
gluster/bitd -p /var/lib/glusterd'.
Program terminated with signal 7, Bus error.
#0  0x00007f87d54d3c54 in gf_changelog_reborp_rpcsvc_notify ()
   from /usr/lib64/libgfchangelog.so.0
Missing separate debuginfos, use: debuginfo-install
glusterfs-3.7.0alpha0-0.17.gited96153.el6.x86_64
(gdb) bt
#0  0x00007f87d54d3c54 in gf_changelog_reborp_rpcsvc_notify ()
   from /usr/lib64/libgfchangelog.so.0
#1  0x0000003588e09e64 in rpcsvc_notify () from /usr/lib64/libgfrpc.so.0
#2  0x0000003588e0b7b8 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
#3  0x00007f87d674c632 in ?? ()
   from /usr/lib64/glusterfs/3.7.0alpha0/rpc-transport/socket.so
#4  0x0000003588a7d060 in ?? () from /usr/lib64/libglusterfs.so.0
#5  0x0000003a968079d1 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003a964e89dd in clone () from /lib64/libc.so

10.70.36.4:
==========
core.24094

Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id
gluster/bitd -p /var/lib/glusterd'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000003efac21734 in gf_log_flush () from /usr/lib64/libglusterfs.so.0
Missing separate debuginfos, use: debuginfo-install
glusterfs-3.7.0alpha0-0.17.gited96153.el6.x86_64
(gdb) bt
#0  0x0000003efac21734 in gf_log_flush () from /usr/lib64/libglusterfs.so.0
#1  0x0000003efac3d7ed in gf_print_trace () from /usr/lib64/libglusterfs.so.0
#2  <signal handler called>
#3  0x00007f6b6400e820 in ?? ()
#4  0x00007f6b7cb5cc0a in gf_changelog_reborp_rpcsvc_notify ()
   from /usr/lib64/libgfchangelog.so.0
#5  0x0000003efb408425 in rpcsvc_handle_disconnect ()
   from /usr/lib64/libgfrpc.so.0
#6  0x0000003efb409f60 in rpcsvc_notify () from /usr/lib64/libgfrpc.so.0
#7  0x0000003efb40b7b8 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
#8  0x00007f6b7ddd86a1 in ?? ()
   from /usr/lib64/glusterfs/3.7.0alpha0/rpc-transport/socket.so
#9  0x0000003efac7d060 in ?? () from /usr/lib64/libglusterfs.so.0
#10 0x00000035324079d1 in start_thread () from /lib64/libpthread.so.0
#11 0x00000035320e89dd in clone () from /lib64/libc.so.6

Actual results:


Expected results:


Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
You are the Docs Contact for the bug.


More information about the Bugs mailing list