[Bugs] [Bug 1261715] New: [HC] Fuse mount crashes, when client-quorum is not met

bugzilla at redhat.com bugzilla at redhat.com
Thu Sep 10 03:24:33 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1261715

            Bug ID: 1261715
           Summary: [HC] Fuse mount crashes, when client-quorum is not met
           Product: GlusterFS
           Version: 3.7.4
         Component: sharding
          Keywords: Triaged
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: kdhananj at redhat.com
        QA Contact: bugs at gluster.org
                CC: bugs at gluster.org, kdhananj at redhat.com,
                    sasundar at redhat.com
        Depends On: 1261399
            Blocks: 1258386 (Gluster-HC-1), 1261706 (glusterfs-3.7.5)



+++ This bug was initially created as a clone of Bug #1261399 +++

Description of problem:
-----------------------
When client quorum is not met, fuse mount crashes

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
glusterfs-3.8dev-0.825.git2e40a95.el7rhgs.x86_64

How reproducible:
-----------------
Tried only once

Steps to Reproduce:
-------------------
1. Setup 3 RHEL 7.1 Nodes(hypervisors), having 1 brick per node, and create 1X3
volume
2. Optimize the volume for virt-store
3. Enable sharding on the volume ( setting shard-block-size to default of 4MB )
4. Create a Application VM which runs on node1
5. Stop all the incoming/outgoing traffic from node1 to/from node2,node3 using
iptables rules

Actual results:
---------------
Fuse mount process crashed on all the 3 nodes

Expected results:
-----------------
Fuse mount should not crash

--- Additional comment from SATHEESARAN on 2015-09-09 05:01:35 EDT ---

back trace from one of the node ( node1 )

(gdb) bt
#0  fuse_writev_cbk (frame=0x7f28943fdbf4, cookie=<optimized out>,
this=0x7f289861cad0, op_ret=0, op_errno=30, stbuf=<optimized out>, postbuf=0x0,
xdata=0x0) at fuse-bridge.c:2271
#1  0x00007f288e24bd64 in io_stats_writev_cbk (frame=0x7f289442b59c,
cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=30,
prebuf=0x0, postbuf=0x0, xdata=0x0)
    at io-stats.c:1400
#2  0x00007f28968e6bd6 in default_writev_cbk (frame=0x7f289441a224,
cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=30,
prebuf=0x0, postbuf=0x0, xdata=0x0)
    at defaults.c:1016
#3  0x00007f288e86ebdd in wb_writev_cbk (frame=0x7f289442c9c4,
cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=30,
prebuf=0x0, postbuf=0x0, xdata=0x0)
    at write-behind.c:1255
#4  0x00007f288ea80843 in shard_writev_do_cbk
(frame=frame at entry=0x7f28943ee204, cookie=<optimized out>, this=<optimized
out>, op_ret=op_ret at entry=-1, op_errno=op_errno at entry=30, 
    prebuf=prebuf at entry=0x7f288d887da4, postbuf=postbuf at entry=0x7f288d887e14,
xdata=xdata at entry=0x0) at shard.c:2958
#5  0x00007f288ecd4923 in dht_writev_cbk (frame=0x7f28943f67b8,
cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=30,
prebuf=0x7f288d887da4, postbuf=0x7f288d887e14, 
    xdata=0x0) at dht-inode-write.c:90
#6  0x00007f288ef18e13 in afr_writev_unwind (frame=0x7f28943f4c2c,
this=<optimized out>) at afr-inode-write.c:197
#7  0x00007f288ef18e7d in afr_transaction_writev_unwind (frame=0x7f28943ef224,
this=0x7f288800a620) at afr-inode-write.c:214
#8  0x00007f288ef21fcb in __afr_txn_write_done (frame=0x7f28943ef224,
this=<optimized out>) at afr-transaction.c:81
#9  0x00007f288ef2529e in afr_unlock_common_cbk
(frame=frame at entry=0x7f28943ef224, this=this at entry=0x7f288800a620,
xdata=<optimized out>, op_errno=0, op_ret=<optimized out>, 
    cookie=<optimized out>) at afr-lk-common.c:633
#10 0x00007f288ef25337 in afr_unlock_inodelk_cbk (frame=0x7f28943ef224,
cookie=<optimized out>, this=0x7f288800a620, op_ret=<optimized out>,
op_errno=0, xdata=<optimized out>)
    at afr-lk-common.c:674
#11 0x00007f288f173b1d in client3_3_finodelk_cbk (req=<optimized out>,
iov=<optimized out>, count=<optimized out>, myframe=0x7f28943f77d8) at
client-rpc-fops.c:1673
#12 0x00007f28966aea10 in rpc_clnt_handle_reply
(clnt=clnt at entry=0x7f28880a6840, pollin=pollin at entry=0x7f28800292f0) at
rpc-clnt.c:759
#13 0x00007f28966aeccf in rpc_clnt_notify (trans=<optimized out>,
mydata=0x7f28880a6870, event=<optimized out>, data=0x7f28800292f0) at
rpc-clnt.c:900
#14 0x00007f28966aa813 in rpc_transport_notify (this=this at entry=0x7f28880b6530,
event=event at entry=RPC_TRANSPORT_MSG_RECEIVED, data=data at entry=0x7f28800292f0)
at rpc-transport.c:539
#15 0x00007f289186f646 in socket_event_poll_in (this=this at entry=0x7f28880b6530)
at socket.c:2231
#16 0x00007f28918722a4 in socket_event_handler (fd=fd at entry=14,
idx=idx at entry=3, data=0x7f28880b6530, poll_in=1, poll_out=0, poll_err=0) at
socket.c:2344
#17 0x00007f28969418ca in event_dispatch_epoll_handler (event=0x7f288d798e80,
event_pool=0x7f289860ed10) at event-epoll.c:570
#18 event_dispatch_epoll_worker (data=0x7f28880445e0) at event-epoll.c:673
#19 0x00007f2895748df5 in start_thread () from /lib64/libpthread.so.0
#20 0x00007f289508f1ad in clone () from /lib64/libc.so.6

--- Additional comment from SATHEESARAN on 2015-09-09 05:06:07 EDT ---

1. Installation info - Test was done on the custom build (
glusterfs-3.8dev-0.825.git2e40a95.el7rhgs.x86_64 ) from mainline ( 3.8dev )

2. Setup info :
---------------

Hypervisor1 - rhs-client10.lab.eng.blr.redhat.com ( this was a SPM )
Hypervisor2 - rhs-client15.lab.eng.blr.redhat.com
Hypervisor3 - rhs-client21.lab.eng.blr.redhat.com

3. Gluster volume info :
------------------------
[root at rhs-client10 ~]# gluster volume info

Volume Name: vmstore
Type: Replicate
Volume ID: 96695606-ff65-4f20-921a-b94d16a62c3a
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: rhs-client10.lab.eng.blr.redhat.com:/rhs/brick1/b1
Brick2: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick1/b1
Brick3: rhs-client21.lab.eng.blr.redhat.com:/rhs/brick1/b1
Options Reconfigured:
performance.readdir-ahead: on
nfs.disable: off
user.cifs: enable
auth.allow: *
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on

4. Client side mount
---------------------
rhs-client10.lab.eng.blr.redhat.com:vmstore fuse.glusterfs  1.9T  3.3G  1.8T  
1% /rhev/data-center/mnt/glusterSD/rhs-client10.lab.eng.blr.redhat.com:vmstore


--- Additional comment from Krutika Dhananjay on 2015-09-09 07:52:28 EDT ---

Nice catch, sas! Just checked the core. Turns out sharding is (wrongly)
returning a non-negative return status even when there is a failure, thereby
causing FUSE-bridge to assume the fop succeeded and dereference the iatt (which
is NULL) and crash.


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1258386
[Bug 1258386] [TRACKER] Gluster Hyperconvergence - Phase 1
https://bugzilla.redhat.com/show_bug.cgi?id=1261399
[Bug 1261399] [HC] Fuse mount crashes, when client-quorum is not met
https://bugzilla.redhat.com/show_bug.cgi?id=1261706
[Bug 1261706] GlusterFS 3.7.5 release tracker
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list