[Bugs] [Bug 1552228] New: Gluster brick process dies during rebalance

Tue Mar 6 18:26:09 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1552228

            Bug ID: 1552228
           Summary: Gluster brick process dies during rebalance
           Product: GlusterFS
           Version: 3.13
         Component: core
          Assignee: bugs at gluster.org
          Reporter: matt.adams at bitplatter.com
                CC: bugs at gluster.org

Description of problem:

During a rebalance a random brick process dies.  The backtrace and behavior
seems to be identical to what was reported in bug 1536294 , which should have
been fixed in the 3.13.2 release.  

pending frames:
frame : type(0) op(36)
frame : type(0) op(45)
frame : type(0) op(27)
frame : type(0) op(27)
frame : type(0) op(13)
frame : type(0) op(27)
frame : type(0) op(13)
frame : type(0) op(29)
frame : type(0) op(17)
frame : type(0) op(27)
frame : type(0) op(14)
frame : type(0) op(14)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2018-03-06 06:34:00
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.13.2
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xaa)[0x7f02360f81ba]
/lib64/libglusterfs.so.0(gf_print_trace+0x2f7)[0x7f0236101e57]
/lib64/libc.so.6(+0x35950)[0x7f0234755950]
/lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f0234f52e00]
/lib64/libglusterfs.so.0(dict_rename_key+0x6e)[0x7f02360f2fae]
/usr/lib64/glusterfs/3.13.2/xlator/features/selinux.so(+0x1f4d)[0x7f0224d2ff4d]
/usr/lib64/glusterfs/3.13.2/xlator/features/marker.so(+0x11db7)[0x7f0224b1adb7]
/lib64/libglusterfs.so.0(default_fsetxattr+0xb5)[0x7f023616b625]
/lib64/libglusterfs.so.0(default_fsetxattr+0xb5)[0x7f023616b625]
/usr/lib64/glusterfs/3.13.2/xlator/features/quota.so(+0xe067)[0x7f02244cf067]
/usr/lib64/glusterfs/3.13.2/xlator/debug/io-stats.so(+0x7376)[0x7f0224297376]
/lib64/libglusterfs.so.0(default_fsetxattr+0xb5)[0x7f023616b625]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0x2ddee)[0x7f021fddfdee]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xbd49)[0x7f021fdbdd49]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xbde5)[0x7f021fdbdde5]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xc78c)[0x7f021fdbe78c]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xbe2e)[0x7f021fdbde2e]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xc62e)[0x7f021fdbe62e]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xc742)[0x7f021fdbe742]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xbe0e)[0x7f021fdbde0e]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xc834)[0x7f021fdbe834]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0x2721e)[0x7f021fdd921e]
/lib64/libgfrpc.so.0(rpcsvc_request_handler+0x9a)[0x7f0235ebba8a]
/lib64/libpthread.so.0(+0x773a)[0x7f0234f5073a]
/lib64/libc.so.6(clone+0x3f)[0x7f0234827e7f]

Version-Release number of selected component (if applicable):
GlusterFS 3.13.2

How reproducible:

We are running Gluster 3.13.2 on a ZFS raidz filesystem with 18 raidz bricks.  

Steps to Reproduce:
1. gluster volume <volname> rebalance start
2. 
3.

Actual results:
After ~36 hours one of the brick processes will die and the rebalance will be
in a failed state.

Expected results:

The rebalance should complete successfully.

Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.