[Bugs] [Bug 1383597] New: Crash when drive became full

Tue Oct 11 07:54:02 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1383597

            Bug ID: 1383597
           Summary: Crash when drive became full
           Product: GlusterFS
           Version: 3.8
         Component: write-behind
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: pavel.cernohorsky at appeartv.com
                CC: bugs at gluster.org

Description of problem:
The FUSE mounted gluster became unavailable, reporting "stat: cannot stat ...:
Transport endpoint is not connected". Logs showed that something crashed (see
Additional info).

Version-Release number of selected component (if applicable):
glusterfs.x86_64                     3.8.4-1.fc24               @updates        
glusterfs-api.x86_64                 3.8.4-1.fc24               @updates        
glusterfs-cli.x86_64                 3.8.4-1.fc24               @updates        
glusterfs-client-xlators.x86_64      3.8.4-1.fc24               @updates        
glusterfs-fuse.x86_64                3.8.4-1.fc24               @updates        
glusterfs-libs.x86_64                3.8.4-1.fc24               @updates        
glusterfs-server.x86_64              3.8.4-1.fc24               @updates

How reproducible:
Quite easily, I would say that reproducible in 30% of attempts.

Steps to Reproduce:
1. Setup 6 FUSE mount points from different nodes to a single Gluster volume.
2. Start very heavy read / write traffic through each of the mount points
(approx 1050 MBit/s of cumulated write and the same amount of cumulated read
traffic across all the mount points together).
3. Make the volume slowly fill with data.
4. At least one of the mount points will go down in the moment the volume gets
full.

Actual results:
At least one of the 6 mount points goes down.

Expected results:
API will keep correctly responding to a POSIX calls as it should on normally
filled in volume, when there is more free space again, things will just keep
working, as it is on the non-crashed clients.

Additional info:
[2016-10-11 06:36:14.835862] W [MSGID: 114031]
[client-rpc-fops.c:854:client3_3_writev_cbk] 0-ramcache-client-2: remote
operation failed [No space left on device]
The message "W [MSGID: 114031] [client-rpc-fops.c:854:client3_3_writev_cbk]
0-ramcache-client-0: remote operation failed [No space left on device]"
repeated 4 times between [2016-10-11 06:36:08.285146] and [2016-10-11
06:36:13.803409]
The message "W [MSGID: 114031] [client-rpc-fops.c:854:client3_3_writev_cbk]
0-ramcache-client-2: remote operation failed [No space left on device]"
repeated 12 times between [2016-10-11 06:36:14.835862] and [2016-10-11
06:36:14.840894]
pending frames:
frame : type(1) op(OPEN)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(STATFS)
frame : type(1) op(LOOKUP)
frame : type(1) op(OPEN)
frame : type(0) op(0)
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(0) op(0)
frame : type(1) op(FLUSH)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(FLUSH)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2016-10-11 06:36:14
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.8.4
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x7e)[0x7efd2ddc31fe]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7efd2ddcc974]
/lib64/libc.so.6(+0x34ed0)[0x7efd2c428ed0]
/usr/lib64/glusterfs/3.8.4/xlator/performance/write-behind.so(+0x68f7)[0x7efd25dd98f7]
/usr/lib64/glusterfs/3.8.4/xlator/performance/write-behind.so(+0x6b5b)[0x7efd25dd9b5b]
/usr/lib64/glusterfs/3.8.4/xlator/performance/write-behind.so(+0x6c37)[0x7efd25dd9c37]
/usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so(+0x51ed1)[0x7efd26035ed1]
/usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x16f97)[0x7efd26281f97]
/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7efd2db8e970]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x27c)[0x7efd2db8ecec]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7efd2db8b073]
/usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x8ac9)[0x7efd28788ac9]
/usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x8cb8)[0x7efd28788cb8]
/lib64/libglusterfs.so.0(+0x7a42a)[0x7efd2de1642a]
/lib64/libpthread.so.0(+0x75ba)[0x7efd2cc1e5ba]
/lib64/libc.so.6(clone+0x6d)[0x7efd2c4f77cd]
---------

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.