[Bugs] [Bug 1405308] New: [compound fops] fuse mount crashed when VM installation is in progress & one of the brick killed

Fri Dec 16 07:30:58 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1405308

            Bug ID: 1405308
           Summary: [compound fops] fuse mount crashed when VM
                    installation is in progress & one of the brick killed
           Product: GlusterFS
           Version: 3.9
         Component: replicate
          Keywords: Triaged
          Severity: high
          Assignee: kdhananj at redhat.com
          Reporter: kdhananj at redhat.com
                CC: bugs at gluster.org, knarra at redhat.com,
                    pkarampu at redhat.com, rhs-bugs at redhat.com,
                    sabose at redhat.com, sasundar at redhat.com,
                    storage-qa-internal at redhat.com
        Depends On: 1405299
            Blocks: 1277939 (Gluster-HC-2)

+++ This bug was initially created as a clone of Bug #1405299 +++

Description of problem:
-----------------------
Fuse mount crashed when the VM installation is in progress on the VM image file
residing on the replica 3 volume, and one of the brick being killed.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------

How reproducible:
-----------------
1/1

Steps to Reproduce:
--------------------
1. Create a replica 3 volume with compound-fops and granular-entry-heal enabled
2. Optimize the volume for VM store with shard-block-size set to 4MB
3. Fuse mount the volume on the RHEL 7.3 client/hypervisor
4. Create a VM image file ( sparse ) on the fuse mounted volume
5. Start OS installation on the VM with RHEL 7.3 server
6. While VM installation is in progress, kill one of the brick of the volume

Actual results:
--------------
Fuse mount crashed/core dumped

Expected results:
------------------
There should not be any process crashing

--- Additional comment from SATHEESARAN on 2016-12-16 02:16:03 EST ---

Backtrace:
----------
Core was generated by `/usr/sbin/glusterfs --volfile-server=10.70.37.138
--volfile-id=/rep3vol /mnt/re'.
Program terminated with signal 11, Segmentation fault.
#0  afr_pre_op_writev_cbk (frame=0x7f24e25d2974, cookie=0x1,
this=0x7f24d000a7b0, op_ret=<optimized out>, op_errno=<optimized out>,
data=<optimized out>, xdata=0x0) at afr-transaction.c:1255
1255                    write_args_cbk = &args_cbk->rsp_list[1];
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-26.el7.x86_64
libcom_err-1.42.9-9.el7.x86_64 libselinux-2.5-6.el7.x86_64
pcre-8.32-15.el7_2.1.x86_64
(gdb) bt
#0  afr_pre_op_writev_cbk (frame=0x7f24e25d2974, cookie=0x1,
this=0x7f24d000a7b0, op_ret=<optimized out>, op_errno=<optimized out>,
data=<optimized out>, xdata=0x0) at afr-transaction.c:1255
#1  0x00007f24d6e91dd7 in client3_3_compound_cbk (req=<optimized out>,
iov=<optimized out>, count=<optimized out>, myframe=0x7f24e25ceea8) at
client-rpc-fops.c:3214
#2  0x00007f24e48ad785 in saved_frames_unwind
(saved_frames=saved_frames at entry=0x7f24c4001620) at rpc-clnt.c:369
#3  0x00007f24e48ad86e in saved_frames_destroy
(frames=frames at entry=0x7f24c4001620) at rpc-clnt.c:386
#4  0x00007f24e48aefd4 in rpc_clnt_connection_cleanup
(conn=conn at entry=0x7f24d007cf18) at rpc-clnt.c:556
#5  0x00007f24e48af864 in rpc_clnt_handle_disconnect (conn=0x7f24d007cf18,
clnt=0x7f24d007cec0) at rpc-clnt.c:881
#6  rpc_clnt_notify (trans=<optimized out>, mydata=0x7f24d007cf18,
event=RPC_TRANSPORT_DISCONNECT, data=0x7f24d008cc10) at rpc-clnt.c:937
#7  0x00007f24e48ab883 in rpc_transport_notify (this=this at entry=0x7f24d008cc10,
event=event at entry=RPC_TRANSPORT_DISCONNECT, data=data at entry=0x7f24d008cc10) at
rpc-transport.c:537
#8  0x00007f24d9173302 in socket_event_poll_err (this=0x7f24d008cc10) at
socket.c:1179
#9  socket_event_handler (fd=<optimized out>, idx=4, data=0x7f24d008cc10,
poll_in=1, poll_out=0, poll_err=<optimized out>) at socket.c:2404
#10 0x00007f24e4b3f4f0 in event_dispatch_epoll_handler (event=0x7f24cfffee80,
event_pool=0x7f24e5b41f00) at event-epoll.c:571
#11 event_dispatch_epoll_worker (data=0x7f24d003f420) at event-epoll.c:674
#12 0x00007f24e3946dc5 in start_thread (arg=0x7f24cffff700) at
pthread_create.c:308
#13 0x00007f24e328b73d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

--- Additional comment from SATHEESARAN on 2016-12-16 02:21 EST ---

--- Additional comment from SATHEESARAN on 2016-12-16 02:22:15 EST ---

Volume information
# gluster volume info rep3vol

Volume Name: rep3vol
Type: Replicate
Volume ID: 28e00021-7773-48f5-a31f-c9f8f2db0a2d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: server1:/gluster/brick1/b1
Brick2: server2:/gluster/brick1/b1
Brick3: server3:/gluster/brick1/b1
Options Reconfigured:
cluster.use-compound-fops: on
user.cifs: off
network.ping-timeout: 30
performance.strict-o-direct: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
features.shard-block-size: 4MB
storage.owner-gid: 107
storage.owner-uid: 107
cluster.granular-entry-heal: enable
cluster.data-self-heal-algorithm: full
features.shard: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: off
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on

--- Additional comment from SATHEESARAN on 2016-12-16 02:23:37 EST ---

Krutika has RCA'ed the issue and found that the patch[1] is missed in the
backport, which has caused this issue.

[1] - http://review.gluster.org/#/c/15482/9

@Krutika, Requesting you to provide the detailed RCA

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1277939
[Bug 1277939] (Gluster-HC-2) [TRACKER] Gluster Hyperconvergence - MVP
https://bugzilla.redhat.com/show_bug.cgi?id=1405299
[Bug 1405299] fuse mount crashed when VM installation is in progress & one
of the brick killed
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=U5FucI9VTs&a=cc_unsubscribe