[Bugs] [Bug 1797882] New: Segmentation fault occurs while truncate file

Tue Feb 4 06:27:23 UTC 2020

https://bugzilla.redhat.com/show_bug.cgi?id=1797882

            Bug ID: 1797882
           Summary: Segmentation fault occurs while truncate file
           Product: GlusterFS
           Version: mainline
          Hardware: All
                OS: All
            Status: NEW
         Component: write-behind
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: pkarampu at redhat.com
                CC: bugs at gluster.org, nbalacha at redhat.com, pasik at iki.fi,
                    rgowdapp at redhat.com, vpolakis at gmail.com
        Depends On: 1753592
            Blocks: 1755678, 1755679
  Target Milestone: ---
    Classification: Community



+++ This bug was initially created as a clone of Bug #1753592 +++

+++ This bug was initially created as a clone of Bug #1749369 +++

Description of problem:
Segmentation fault occurs when bricks are nearly full 100% and in parallel
truncate of a file is attempted (No space left on device).
Prerequisite is that performance xlators are activated (read-ahead,
write-behind etc) like following

[root@ glusterfs]# gluster --glusterd-sock=/d/backends/1/glusterd/gd.sock
volume info

Volume Name: patchy
Type: Replicate
Volume ID: 5b79a54d-8207-4d38-a0e2-001adbdd11d8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 127.1.1.1:/d/backends/1/brick
Brick2: 127.1.1.2:/d/backends/2/brick
Brick3: 127.1.1.3:/d/backends/3/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off


back trace is like follwoing
#0  mem_put (ptr=0x7fa8e006ec40) at mem-pool.c:847
#1  0x00007fa8e94d3fea in FRAME_DESTROY (frame=0x7fa8d0015698) at
../../../../libglusterfs/src/stack.h:173
#2  STACK_DESTROY (stack=0x7fa8d000b238) at
../../../../libglusterfs/src/stack.h:193
#3  fuse_truncate_cbk (frame=<optimized out>, cookie=<optimized out>,
this=<optimized out>, op_ret=<optimized out>, op_errno=28, prebuf=<optimized
out>, postbuf=0x7fa8e6637530, xdata=0x0) at fuse-bridge.c:813
#4  0x00007fa8e45b5f7d in io_stats_ftruncate_cbk (frame=0x7fa8d0009508,
cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28,
prebuf=0x7fa8d001cac0, postbuf=0x7fa8e6637530, xdata=0x0) at io-stats.c:2425
#5  0x00007fa8e47d6feb in mdc_ftruncate_cbk (frame=0x7fa8d0015e58,
cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28,
prebuf=0x7fa8d001cac0, postbuf=0x7fa8e6637530, xdata=0x0) at md-cache.c:1477
#6  0x00007fa8ec29e668 in default_ftruncate_cbk (frame=0x7fa8d000a558,
cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28,
prebuf=0x7fa8d001cac0, postbuf=0x7fa8e6637530, xdata=0x0) at defaults.c:1375
#7  0x00007fa8e4bfc539 in qr_ftruncate_cbk (frame=0x7fa8d000d288,
cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28,
prebuf=0x7fa8d001cac0, postbuf=0x7fa8e6637530, xdata=0x0) at quick-read.c:890
#8  0x00007fa8e4e06c48 in ioc_ftruncate_cbk (frame=frame at entry=0x7fa8d000f1e8,
cookie=<optimized out>, this=<optimized out>, op_ret=op_ret at entry=-1,
op_errno=op_errno at entry=28, prebuf=prebuf at entry=0x7fa8d001cac0,
postbuf=postbuf at entry=0x7fa8e6637530, xdata=xdata at entry=0x0) at io-cache.c:1277
#9  0x00007fa8e5022f2a in rda_ftruncate_cbk (frame=0x7fa8d000f088,
cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28,
prebuf=0x7fa8d001cac0, postbuf=0x7fa8d001cb58, xdata=0x0) at
readdir-ahead.c:826
#10 0x00007fa8e522df58 in ra_truncate_cbk (frame=0x7fa8d0015588,
cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28,
prebuf=0x7fa8d001cac0, postbuf=0x7fa8d001cb58, xdata=0x0) at read-ahead.c:668
#11 0x00007fa8ec22bd43 in call_resume_unwind (stub=stub at entry=0x7fa8d001bf88)
at call-stub.c:2348
#12 0x00007fa8ec23a989 in call_unwind_error_keep_stub (stub=0x7fa8d001bf88,
op_ret=op_ret at entry=-1, op_errno=28) at call-stub.c:2541
#13 0x00007fa8e544482c in wb_do_winds (wb_inode=wb_inode at entry=0x7fa8e006ec40,
tasks=tasks at entry=0x7fa8e6637750) at write-behind.c:1735
#14 0x00007fa8e54448db in wb_process_queue
(wb_inode=wb_inode at entry=0x7fa8e006ec40) at write-behind.c:1778
#15 0x00007fa8e5444a5f in wb_fulfill_cbk (frame=frame at entry=0x7fa8e005e808,
cookie=<optimized out>, this=<optimized out>, op_ret=op_ret at entry=-1,
op_errno=op_errno at entry=28, prebuf=prebuf at entry=0x7fa8e00731e0,
postbuf=postbuf at entry=0x7fa8e0073278, xdata=xdata at entry=0x0) at
write-behind.c:1105
#16 0x00007fa8e56c5f60 in dht_writev_cbk (frame=frame at entry=0x7fa8e006c0a8,
cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28,
prebuf=prebuf at entry=0x7fa8e00731e0, postbuf=postbuf at entry=0x7fa8e0073278,
xdata=0x0) at dht-inode-write.c:140
#17 0x00007fa8e592f31e in afr_writev_unwind (frame=frame at entry=0x7fa8e0000b18,
this=this at entry=0x7fa8e000f820) at afr-inode-write.c:234
#18 0x00007fa8e592f4fd in afr_transaction_writev_unwind (frame=0x7fa8e005ed78,
this=0x7fa8e000f820) at afr-inode-write.c:248
#19 0x00007fa8e5941be2 in afr_transaction_done (frame=0x7fa8e005ed78,
this=0x7fa8e000f820) at afr-transaction.c:367
#20 0x00007fa8e59453d3 in afr_unlock_common_cbk
(frame=frame at entry=0x7fa8e005ed78, this=this at entry=0x7fa8e000f820,
op_ret=op_ret at entry=0, xdata=<optimized out>, op_errno=<optimized out>,
cookie=0x2) at afr-lk-common.c:243
#21 0x00007fa8e59461e8 in afr_unlock_inodelk_cbk (frame=0x7fa8e005ed78,
cookie=0x2, this=0x7fa8e000f820, op_ret=0, op_errno=<optimized out>,
xdata=<optimized out>) at afr-lk-common.c:281
#22 0x00007fa8e5c18f3f in client4_0_finodelk_cbk (req=<optimized out>,
iov=<optimized out>, count=<optimized out>, myframe=<optimized out>) at
client-rpc-fops_v2.c:1398
#23 0x00007fa8ebfe0980 in rpc_clnt_handle_reply
(clnt=clnt at entry=0x7fa8e0053800, pollin=pollin at entry=0x7fa8e006e270) at
rpc-clnt.c:755
#24 0x00007fa8ebfe0d53 in rpc_clnt_notify (trans=0x7fa8e0053b20,
mydata=0x7fa8e0053830, event=<optimized out>, data=0x7fa8e006e270) at
rpc-clnt.c:923
#25 0x00007fa8ebfdcd83 in rpc_transport_notify (this=this at entry=0x7fa8e0053b20,
event=event at entry=RPC_TRANSPORT_MSG_RECEIVED, data=data at entry=0x7fa8e006e270)
at rpc-transport.c:525
#26 0x00007fa8e6aae2cb in socket_event_poll_in (notify_handled=true,
this=0x7fa8e0053b20) at socket.c:2504
#27 socket_event_handler (fd=<optimized out>, idx=<optimized out>,
gen=<optimized out>, data=0x7fa8e0053b20, poll_in=<optimized out>,
poll_out=<optimized out>, poll_err=0) at socket.c:2905
#28 0x00007fa8ec2762d9 in event_dispatch_epoll_handler (event=0x7fa8e6637e90,
event_pool=0x18420b0) at event-epoll.c:591
#29 event_dispatch_epoll_worker (data=0x1865220) at event-epoll.c:668
#30 0x00007fa8eb99b182 in start_thread (arg=0x7fa8e6638700) at
pthread_create.c:312
#31 0x00007fa8eb6c830d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Program received signal SIGSEGV, Segmentation fault.
mem_put (ptr=0x7fa8e006ec40) at mem-pool.c:869
869         GF_ATOMIC_DEC(hdr->pool->active);
(gdb)

so it appears that while stack unwind of the frames following an error response
from brick (No space left on device) frame->local includes a memory location
that is not allocated via
mem_get but via calloc.

Breakpoint 1, __gf_calloc (nmemb=nmemb at entry=1, size=size at entry=208,
type=type at entry=163, typestr=typestr at entry=0x7fa8e544b892
"gf_wb_mt_wb_inode_t") at mem-pool.c:119
119         return (void *)ptr;
$2034 = (void *) 0x7fa8e006ec40
#0  __gf_calloc (nmemb=nmemb at entry=1, size=size at entry=208, type=type at entry=163,
typestr=typestr at entry=0x7fa8e544b892 "gf_wb_mt_wb_inode_t") at mem-pool.c:119
#1  0x00007fa8e5441e7b in __wb_inode_create (this=this at entry=0x7fa8e0013470,
inode=inode at entry=0x7fa8e00603a8) at write-behind.c:667
#2  0x00007fa8e5442063 in wb_inode_create (this=0x7fa8e0013470,
inode=0x7fa8e00603a8) at write-behind.c:708
#3  0x00007fa8e5444fc2 in wb_writev (frame=frame at entry=0x7fa8e005ed78,
this=<optimized out>, fd=fd at entry=0x7fa8d000cb88,
vector=vector at entry=0x7fa8d000d0b0, count=count at entry=1, offset=offset at entry=0,
flags=flags at entry=32769, iobref=iobref at entry=0x7fa8d00160f0,
xdata=xdata at entry=0x0) at write-behind.c:1851
#4  0x00007fa8e5231c78 in ra_writev (frame=frame at entry=0x7fa8e005d2d8,
this=0x7fa8e0014f50, fd=fd at entry=0x7fa8d000cb88,
vector=vector at entry=0x7fa8d000d0b0, count=count at entry=1, offset=offset at entry=0,
flags=flags at entry=32769, iobref=iobref at entry=0x7fa8d00160f0,
xdata=xdata at entry=0x0) at read-ahead.c:650
#5  0x00007fa8e501f10a in rda_writev (frame=frame at entry=0x7fa8e0000b18,
this=0x7fa8e0016940, fd=fd at entry=0x7fa8d000cb88,
vector=vector at entry=0x7fa8d000d0b0, count=count at entry=1, off=off at entry=0,
flags=flags at entry=32769, iobref=iobref at entry=0x7fa8d00160f0,
xdata=xdata at entry=0x0) at readdir-ahead.c:696
#6  0x00007fa8e4e0b2e6 in ioc_writev (frame=frame at entry=0x7fa8e006c0a8,
this=0x7fa8e00188d0, fd=fd at entry=0x7fa8d000cb88,
vector=vector at entry=0x7fa8d000d0b0, count=count at entry=1, offset=offset at entry=0,
flags=flags at entry=32769, iobref=iobref at entry=0x7fa8d00160f0,
xdata=xdata at entry=0x0) at io-cache.c:1233
#7  0x00007fa8e4bfabce in qr_writev (frame=frame at entry=0x7fa8e005e808,
this=0x7fa8e001a370, fd=fd at entry=0x7fa8d000cb88, iov=iov at entry=0x7fa8d000d0b0,
count=count at entry=1, offset=offset at entry=0, flags=flags at entry=32769,
iobref=iobref at entry=0x7fa8d00160f0, xdata=xdata at entry=0x0) at quick-read.c:845
#8  0x00007fa8ec2c2240 in default_writev_resume (frame=0x7fa8d000f088,
this=0x7fa8e001bde0, fd=0x7fa8d000cb88, vector=0x7fa8d000d0b0, count=1, off=0,
flags=32769, iobref=0x7fa8d00160f0, xdata=0x0) at defaults.c:2274
#9  0x00007fa8ec23a66a in call_resume_wind (stub=<optimized out>) at
call-stub.c:2040

and contains the inode entry.
magik number is there which troubles the mem_put and core dump occurs while
trying to derefenrce the pool pointer.

The destroyed frame is always ra_truncate_cbk winded from ra_ftruncate
e.g.
(gdb) p *frame
$9 = {root = 0x7fa1a0000ec8, parent = 0x7fa1a000c098, frames = {next =
0x7fa1a000a338, prev = 0x7fa1a000a338}, local = 0x0, this = 0x7fa1ac013470, ret
= 0x7fa1b2baedf0 <ra_truncate_cbk>, ref_count = 0, lock = {spinlock = 0, mutex
= {
      __data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind =
-1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size
= '\000' <repeats 16 times>, "\377\377\377\377", '\000' <repeats 19 times>,
      __align = 0}}, cookie = 0x7fa1a000a328, complete = true, op =
GF_FOP_FTRUNCATE, begin = {tv_sec = 164325, tv_nsec = 696128864}, end = {tv_sec
= 164325, tv_nsec = 698002711},
  wind_from = 0x7fa1b2bb86ff <__FUNCTION__.14980> "ra_ftruncate", wind_to =
0x7fa1b2bb8470 "(this->children->xlator)->fops->ftruncate", unwind_from =
0x7fa1b9c48e50 <__FUNCTION__.18620> "call_resume_unwind",
  unwind_to = 0x7fa1b2bb7fd7 "ra_truncate_cbk"}
(gdb) q

and the inode ptr is copied to the frame local in the 

2237 wb_ftruncate(call_frame_t *frame, xlator_t *this, fd_t *fd, off_t offset,
2238              dict_t *xdata)
2239 {
2240     wb_inode_t *wb_inode = NULL;
2241     call_stub_t *stub = NULL;
2242     int32_t op_errno = 0;
2243
2244     wb_inode = wb_inode_create(this, fd->inode);
2245     if (!wb_inode) {
2246         op_errno = ENOMEM;
2247         goto unwind;
2248     }
2249
2250     frame->local = wb_inode;
2251
2252     stub = fop_ftruncate_stub(frame, wb_ftruncate_helper, fd, offset,
xdata);

as seen in the gdb:

Breakpoint 1, wb_ftruncate (frame=frame at entry=0x7f605c00fc18,
this=0x7f6060013470, fd=fd at entry=0x7f605c00cbd8, offset=offset at entry=1,
    xdata=xdata at entry=0x0) at write-behind.c:2239
2239    {
(gdb) p frame
$1 = (call_frame_t *) 0x7f605c00fc18
(gdb) p frame->local
$2 = (void *) 0x0
(gdb) watch frame->local
Watchpoint 2: frame->local
(gdb) c
Continuing.
Watchpoint 2: frame->local

Old value = (void *) 0x0
New value = (void *) 0x7f606006c540
wb_ftruncate (frame=frame at entry=0x7f605c00fc18, this=<optimized out>,
fd=fd at entry=0x7f605c00cbd8, offset=offset at entry=1,
    xdata=xdata at entry=0x0) at write-behind.c:2252
2252        stub = fop_ftruncate_stub(frame, wb_ftruncate_helper, fd, offset,
xdata);
(gdb) bt
#0  wb_ftruncate (frame=frame at entry=0x7f605c00fc18, this=<optimized out>,
fd=fd at entry=0x7f605c00cbd8, offset=offset at entry=1,
    xdata=xdata at entry=0x0) at write-behind.c:2252
#1  0x00007f6066c7dfc6 in ra_ftruncate (frame=frame at entry=0x7f605c00a308,
this=0x7f6060014f50, fd=fd at entry=0x7f605c00cbd8,
    offset=offset at entry=1, xdata=xdata at entry=0x0) at read-ahead.c:903
#2  0x00007f6066a69627 in rda_ftruncate (frame=frame at entry=0x7f605c00c148,
this=0x7f6060016940, fd=fd at entry=0x7f605c00cbd8,
    offset=offset at entry=1, xdata=xdata at entry=0x0) at readdir-ahead.c:835
#3  0x00007f6066856b78 in ioc_ftruncate (frame=frame at entry=0x7f605c00c9b8,
this=0x7f60600188d0, fd=fd at entry=0x7f605c00cbd8,
    offset=offset at entry=1, xdata=xdata at entry=0x0) at io-cache.c:1327
#4  0x00007f60666461d1 in qr_ftruncate (frame=frame at entry=0x7f605c00fe98,
this=0x7f606001a370, fd=fd at entry=0x7f605c00cbd8,
    offset=offset at entry=1, xdata=xdata at entry=0x0) at quick-read.c:904
#5  0x00007f606dd0986a in default_ftruncate_resume (frame=0x7f605c0016d8,
this=0x7f606001bde0, fd=0x7f605c00cbd8, offset=1, xdata=0x0)
    at defaults.c:2023


How reproducible:
always

Steps to Reproduce:
- setup a test cluster 
Volume Name: patchy
Type: Replicate
Volume ID: 5b79a54d-8207-4d38-a0e2-001adbdd11d8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 127.1.1.1:/d/backends/1/brick
Brick2: 127.1.1.2:/d/backends/2/brick
Brick3: 127.1.1.3:/d/backends/3/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

- fill the volume with dummy files e.g.
for i in {1..10}; do dd if=/home/epolakis/prom.img
of=/mnt/glusterfs/0/smallfile_IB__N$i bs=4096 count=10000 ; done

- use a test code to truncate a file like:
#include <unistd.h>                                                             
#include <sys/types.h>                                                          
#include <sys/stat.h>                                                           
#include <fcntl.h>                                                              
#include <stdio.h>                                                              
#include <stdlib.h>                                                             
#include <memory.h>                                                             

#define STR_LEN 1000                                                            

main() {                                                                        
  char *m_string;                                                               
  int  fd;                                                            
  int  ret;                                                                     
  char fn[]="/mnt/glusterfs/0/smallfile_N1";                                   
  struct stat st;                                                               

  if ((m_string = (char*) malloc(STR_LEN)) == NULL)                          
    perror("malloc() error");                                                   
  else if ((fd = open(fn, O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR)) < 0)
    perror("open error");                                                       
  else {                                                                        
    memset(m_string, '0', STR_LEN);                                          
    if ((ret = write(fd, m_string, STR_LEN)) == -1)             
      perror("write() error");                                                  
    else {                                                                      
      if (ftruncate(fd, 1) != 0)                                      
        perror("ftruncate() error");                                            
      else {                                                                    
        fstat(fd, &st);                                               
        printf("the file has %ld bytes\n", (long) st.st_size);                  
      }                                                                         
    }                                                                           
    close(fd);                                                        
  }                                                                             
  free(m_string);                                                               
}       

Actual results:


Expected results:


Additional info:
If my understanding is correct the inode that is calloc'ed should not released
till the forget.

--- Additional comment from evangelos on 2019-09-10 10:18:30 UTC ---

Would something like below enough ?

diff --git a/libglusterfs/src/mem-pool.c b/libglusterfs/src/mem-pool.c
index cff9632..fad6a2c 100644
--- a/libglusterfs/src/mem-pool.c
+++ b/libglusterfs/src/mem-pool.c
@@ -863,6 +863,13 @@ mem_put(void *ptr)
         /* Not one of ours; don't touch it. */
         return;
     }
+
+    if(!hdr->pool_list){
+        gf_msg_callingfn("mem-pool", GF_LOG_ERROR, EINVAL, LG_MSG_INVALID_ARG,
+                        "invalid argument hdr->pool_list NULL");
+        return;
+    }
+
     pool_list = hdr->pool_list;
     pt_pool = &pool_list->pools[hdr->power_of_two - POOL_SMALLEST];

this (as far as i understood but please correct me) this allocation is for the
inode that shouldn't be released unless kernel send the forget (rm/unlint etc).

--- Additional comment from Worker Ant on 2019-09-19 11:34:42 UTC ---

REVIEW: https://review.gluster.org/23445 (Segmentation fault occurs during
truncate) posted (#2) for review on master by Vangelis Polakis

--- Additional comment from Nithya Balachandran on 2019-09-24 05:37:47 UTC ---

Hi,

What version of gluster are you using?

--- Additional comment from evangelos on 2019-09-24 07:54:23 UTC ---

I have seen in this issue in 5.6

--- Additional comment from Nithya Balachandran on 2019-09-25 04:30:33 UTC ---

Thanks. We will take a look and get back to you.

--- Additional comment from Nithya Balachandran on 2019-09-25 13:50:34 UTC ---

Hi,


Thank you for an excellent test. Would you mind if I added it to the regression
framework?

I'm afraid the patch provided will not fix the issue here. In most xlators, the
local structure is allocated from a mempool which is why the mem_put will
usually work. 

However, the wb xlator does not have a defined local struct of its own.
Instead, for the truncate and ftruncate fops, it saved the wb_inode (which is
allocated using GF_CALLOC) in frame->local. In the normal code path, this is
cleared in the wb_truncate_cbk and wb_ftruncate_cbk functions and frame->local
is set to NULL before unwinding the fop. 


The test code, however, does a write followed by a conflicting truncate. As the
write fails with ENOSPC, __wb_pick_winds determines that there is a conflict,
updates the request op_ret and op_errno for the ftruncate request. The
wb_ftruncate_cbk is not called and the req is instead unwound in wb_do_winds ->
call_unwind_error_keep_stub code path. This does not clear frame->local which
causes the mem_put to crash.

--- Additional comment from Nithya Balachandran on 2019-09-25 14:27:30 UTC ---

(In reply to evangelos from comment #3)
> I have seen in this issue in 5.6

The issue also exists in the master branch.

--- Additional comment from Worker Ant on 2019-09-25 14:29:24 UTC ---

REVIEW: https://review.gluster.org/23485 (perf/write-behind: Clear frame->local
on conflict error) posted (#1) for review on master by N Balachandran

--- Additional comment from Worker Ant on 2019-09-26 02:28:52 UTC ---

REVIEW: https://review.gluster.org/23485 (perf/write-behind: Clear frame->local
on conflict error) merged (#1) on master by N Balachandran

--- Additional comment from evangelos on 2019-09-26 22:06:01 UTC ---

thanks a lot for the explanation/complete solution! yes, of course you can add
it in the regression test.
One comment though, wouldn't be good to also add the check in the mem_put for
the ptr before referencing ? i mean in case something similar happens in the
future instead or coredumping an error would appear.


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1753592
[Bug 1753592] Segmentation fault occurs while truncate file
https://bugzilla.redhat.com/show_bug.cgi?id=1755678
[Bug 1755678] Segmentation fault occurs while truncate file
https://bugzilla.redhat.com/show_bug.cgi?id=1755679
[Bug 1755679] Segmentation fault occurs while truncate file
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.