[Bugs] [Bug 1214169] New: glusterfsd crashed while rebalance and self-heal were in progress
bugzilla at redhat.com
bugzilla at redhat.com
Wed Apr 22 07:06:26 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1214169
Bug ID: 1214169
Summary: glusterfsd crashed while rebalance and self-heal were
in progress
Product: GlusterFS
Version: 3.7.0
Component: core
Severity: high
Assignee: bugs at gluster.org
Reporter: ssampat at redhat.com
CC: bugs at gluster.org, gluster-bugs at redhat.com
Description of problem:
-----------------------
On a 6x3 volume, some bricks were brought down when rebalance was in progress.
This caused the mount to be read-only (client quorum was enabled). While
rebalance was in progress, the bricks were brought back up. While checking
self-heal info output, one brick was found to be not connected.
This was not one of the bricks that was brought down.
Following is seen in brick logs -
pending frames:
frame : type(0) op(18)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2015-04-22 11:28:30
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7dev
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x3d140221c6]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x3d1403de2f]
/lib64/libc.so.6[0x3d120326a0]
/usr/lib64/glusterfs/3.7dev/xlator/storage/posix.so(posix_getxattr+0xbd3)[0x7f558e324c03]
/usr/lib64/libglusterfs.so.0(default_getxattr+0x7b)[0x3d14027bab]
/usr/lib64/libglusterfs.so.0(default_getxattr+0x7b)[0x3d14027bab]
/usr/lib64/libglusterfs.so.0(default_getxattr+0x7b)[0x3d14027bab]
/usr/lib64/glusterfs/3.7dev/xlator/features/bitrot-stub.so(br_stub_getxattr+0x1e9)[0x7f558d6923a9]
/usr/lib64/glusterfs/3.7dev/xlator/features/access-control.so(posix_acl_getxattr+0x173)[0x7f558d48b9f3]
/usr/lib64/glusterfs/3.7dev/xlator/features/locks.so(pl_getxattr+0x1bb)[0x7f558d275d8b]
/usr/lib64/libglusterfs.so.0(default_getxattr+0x7b)[0x3d14027bab]
/usr/lib64/libglusterfs.so.0(default_getxattr_resume+0x13a)[0x3d1402b38a]
/usr/lib64/libglusterfs.so.0(call_resume+0x80)[0x3d14046470]
/usr/lib64/glusterfs/3.7dev/xlator/performance/io-threads.so(iot_worker+0x158)[0x7f558ce5a388]
/lib64/libpthread.so.0[0x3d124079d1]
/lib64/libc.so.6(clone+0x6d)[0x3d120e88fd]
---------
Following is the volume configuration -
# gluster volume info vol
Volume Name: vol
Type: Distributed-Replicate
Volume ID: 133fe4f3-987c-474d-9904-c28475d4812f
Status: Started
Number of Bricks: 6 x 3 = 18
Transport-type: tcp
Bricks:
Brick1: vm3-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick1/b1
Brick2: vm4-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick1/b1
Brick3: vm5-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick1/b1
Brick4: vm6-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick1/b1
Brick5: vm3-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick2/b1
Brick6: vm4-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick2/b1
Brick7: vm5-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick2/b1
Brick8: vm6-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick2/b1
Brick9: vm3-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick3/b1
Brick10: vm4-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick3/b1
Brick11: vm5-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick3/b1
Brick12: vm6-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick3/b1
Brick13: vm3-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick4/b1
Brick14: vm4-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick4/b1
Brick15: vm5-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick4/b1
Brick16: vm6-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick4/b1
Brick17: vm3-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick5/b1
Brick18: vm4-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick5/b1
Options Reconfigured:
cluster.quorum-type: auto
client.event-threads: 4
server.event-threads: 5
features.uss: enable
features.quota: on
cluster.consistent-metadata: on
Note that the client was on a different version of glusterfs than the server.
Version-Release number of selected component (if applicable):
---------------------------------------------------------------
On the server - glusterfs-3.7dev-0.965.git2788ddd.el6.x86_64
On the client - glusterfs-3.7dev-0.1009.git8b987be.el6.x86_64
How reproducible:
------------------
Saw this issue once.
Steps to Reproduce:
--------------------
1. On a 6x3 volume, started remove-brick operation of one replica set.
2. After completion of data migration for the remove-brick operation, executed
stop remove-brick.
3. Started rebalance operation on the volume.
4. While rebalance was in progress, killed two bricks each in 3 replica sets.
5. After a while, while rebalance was still running, started the volume using
force.
6. Was monitoring volume heal info output when I noticed that one of the bricks
was not connected.
Actual results:
----------------
Brick process crashed.
Expected results:
------------------
Brick process is not expected to crash.
Additional info:
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list