[Bugs] [Bug 1246121] New: Disperse volume : client glusterfs crashed while running IO

bugzilla at redhat.com bugzilla at redhat.com
Thu Jul 23 13:32:51 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1246121

            Bug ID: 1246121
           Summary: Disperse volume : client glusterfs crashed while
                    running IO
           Product: GlusterFS
           Version: 3.7.2
         Component: disperse
          Assignee: bugs at gluster.org
          Reporter: xhernandez at datalab.es
                CC: annair at redhat.com, asrivast at redhat.com,
                    bugs at gluster.org, byarlaga at redhat.com,
                    gluster-bugs at redhat.com, nsathyan at redhat.com,
                    pkarampu at redhat.com, rhs-bugs at redhat.com
        Depends On: 1242423, 1243187
            Blocks: 1223636



+++ This bug was initially created as a clone of Bug #1243187 +++

Description of problem:
======================

Seen the crash while running IO on fuse mount. (files creation with dd varying
block size, directory creations and linux untar's)

Backtrace:
=========

(gdb) bt
#0  ec_unlock_timer_del (link=0x7fe43c231c2c) at ec-common.c:1696
#1  0x00007fe451f28713 in gf_timer_proc (ctx=0x7fe452d71010) at timer.c:194
#2  0x00007fe450ff1a51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fe45095b96d in clone () from /lib64/libc.so.6
(gdb)  t a a  bt

Thread 9 (Thread 0x7fe437fff700 (LWP 15718)):
#0  0x00007fe450953a57 in writev () from /lib64/libc.so.6
#1  0x00007fe4492251ff in send_fuse_iov (this=<value optimized out>,
finh=<value optimized out>, iov_out=0x7fe437ffe120, count=2) at
fuse-bridge.c:191
#2  0x00007fe44922541a in send_fuse_data (this=<value optimized out>,
finh=<value optimized out>, data=<value optimized out>, size=<value optimized
out>)
    at fuse-bridge.c:230
#3  0x00007fe449235bda in fuse_writev_cbk (frame=0x7fe44f9c0958, cookie=<value
optimized out>, this=0x7fe452d9e600, op_ret=65536, op_errno=0, 
    stbuf=0x7fe437ffe500, postbuf=0x7fe437ffe500, xdata=0x0) at
fuse-bridge.c:2269
#4  0x00007fe43eb2a3ae in io_stats_writev_cbk (frame=0x7fe44fb0b380,
cookie=<value optimized out>, this=<value optimized out>, op_ret=65536,
op_errno=0, 
    prebuf=0x7fe437ffe500, postbuf=0x7fe437ffe500, xdata=0x0) at
io-stats.c:1404
#5  0x00007fe43ef4ddb0 in mdc_writev_cbk (frame=0x7fe44fb57fb0, cookie=<value
optimized out>, this=<value optimized out>, op_ret=65536, 
    op_errno=<value optimized out>, prebuf=0x7fe437ffe500,
postbuf=0x7fe437ffe500, xdata=0x0) at md-cache.c:1524
#6  0x00007fe451f1706c in default_writev_cbk (frame=0x7fe44fb28dd8,
cookie=<value optimized out>, this=<value optimized out>, op_ret=65536,
op_errno=0, 
    prebuf=<value optimized out>, postbuf=0x7fe437ffe500, xdata=0x0) at
defaults.c:1021
#7  0x00007fe451f1706c in default_writev_cbk (frame=0x7fe44fb3a0a4,
cookie=<value optimized out>, this=<value optimized out>, op_ret=65536,
op_errno=0, 
    prebuf=<value optimized out>, postbuf=0x7fe437ffe500, xdata=0x0) at
defaults.c:1021
#8  0x00007fe43f56e974 in ioc_writev_cbk (frame=0x7fe44fb2da70, cookie=<value
optimized out>, this=<value optimized out>, op_ret=65536, op_errno=0, 
    prebuf=0x7fe437ffe500, postbuf=0x7fe437ffe500, xdata=0x0) at
io-cache.c:1244
#9  0x00007fe43f982bd9 in ra_writev_cbk (frame=0x7fe44fb35360, cookie=<value
optimized out>, this=<value optimized out>, op_ret=65536, op_errno=0, 
    prebuf=0x7fe437ffe500, postbuf=0x7fe437ffe500, xdata=0x0) at
read-ahead.c:661
#10 0x00007fe43fb8f30c in wb_do_unwinds (wb_inode=<value optimized out>,
lies=0x7fe437ffe5d0) at write-behind.c:924
#11 0x00007fe43fb91b9c in wb_process_queue (wb_inode=0x7fe4258cb4a0) at
write-behind.c:1213
#12 0x00007fe43fb924f8 in wb_writev (frame=0x7fe44fadae2c, this=<value
optimized out>, fd=0x7fe441da9cd0, vector=0x7fe3eb17e6f0, count=1,
offset=6160384, 
    flags=32769, iobref=0x7fe401611f30, xdata=0x0) at write-behind.c:1333
#13 0x00007fe43f98288a in ra_writev (frame=0x7fe44fb35360, this=0x7fe439c61a20,
fd=0x7fe441da9cd0, vector=0x7fe3eb17e6f0, count=1, offset=6160384, 
    flags=32769, iobref=0x7fe401611f30, xdata=0x0) at read-ahead.c:689
#14 0x00007fe451f0f960 in default_writev (frame=0x7fe44fb35360,
this=0x7fe439c62780, fd=0x7fe441da9cd0, vector=0x7fe3eb17e6f0, count=<value
optimized out>, 
    off=<value optimized out>, flags=32769, iobref=0x7fe401611f30, xdata=0x0)
at defaults.c:1836
#15 0x00007fe43f56e6c6 in ioc_writev (frame=0x7fe44fb2da70,
this=0x7fe439c63610, fd=0x7fe441da9cd0, vector=0x7fe3eb17e6f0, count=1,
offset=6160384, 
    flags=32769, iobref=0x7fe401611f30, xdata=0x0) at io-cache.c:1285
#16 0x00007fe43f362040 in qr_writev (frame=0x7fe44fb3a0a4, this=0x7fe439c64360,
fd=0x7fe441da9cd0, iov=0x7fe3eb17e6f0, count=1, offset=6160384, flags=32769, 
    iobref=0x7fe401611f30, xdata=0x0) at quick-read.c:636
#17 0x00007fe451f13a23 in default_writev_resume (frame=0x7fe44fb28dd8,
this=0x7fe439c65120, fd=0x7fe441da9cd0, vector=0x7fe3eb17e6f0, count=1,
off=6160384, 
    flags=32769, iobref=0x7fe401611f30, xdata=0x0) at defaults.c:1395
#18 0x00007fe451f2d3aa in call_resume_wind (stub=<value optimized out>) at
call-stub.c:2124
#19 0x00007fe451f31640 in call_resume (stub=0x7fe44f5a5c24) at call-stub.c:2576
#20 0x00007fe43f158760 in open_and_resume (this=0x7fe439c65120,
fd=0x7fe441da9cd0, stub=0x7fe44f5a5c24) at open-behind.c:242
#21 0x00007fe43f159dfb in ob_writev (frame=0x7fe44fb28dd8, this=0x7fe439c65120,
fd=0x7fe441da9cd0, iov=<value optimized out>, count=<value optimized out>, 
    offset=<value optimized out>, flags=32769, iobref=0x7fe401611f30,
xdata=0x0) at open-behind.c:414
#22 0x00007fe43ef4ab3b in mdc_writev (frame=0x7fe44fb57fb0,
this=0x7fe439c65ee0, fd=0x7fe441da9cd0, vector=0x7fe405cc40c0, count=1,
offset=6160384, 
    flags=32769, iobref=0x7fe401611f30, xdata=0x0) at md-cache.c:1542
#23 0x00007fe451f0f960 in default_writev (frame=0x7fe44fb57fb0,
this=0x7fe439c68030, fd=0x7fe441da9cd0, vector=0x7fe405cc40c0, count=<value
optimized out>, 
    off=<value optimized out>, flags=32769, iobref=0x7fe401611f30, xdata=0x0)
at defaults.c:1836
#24 0x00007fe43eb227a7 in io_stats_writev (frame=0x7fe44fb0b380,
this=0x7fe439c69180, fd=0x7fe441da9cd0, vector=0x7fe405cc40c0, count=1,
offset=6160384, 
    flags=32769, iobref=0x7fe401611f30, xdata=0x0) at io-stats.c:2164
#25 0x00007fe451f0f960 in default_writev (frame=0x7fe44fb0b380,
this=0x7fe439c6a180, fd=0x7fe441da9cd0, vector=0x7fe405cc40c0, count=<value
optimized out>, 
    off=<value optimized out>, flags=32769, iobref=0x7fe401611f30, xdata=0x0)
at defaults.c:1836
#26 0x00007fe43e90904e in meta_writev (frame=0x7fe44fb0b380,
this=0x7fe439c6a180, fd=0x7fe441da9cd0, iov=0x7fe405cc40c0, count=<value
optimized out>, 
    offset=<value optimized out>, flags=32769, iobref=0x7fe401611f30,
xdata=0x0) at meta.c:147
#27 0x00007fe4492313eb in fuse_write_resume (state=<value optimized out>) at
fuse-bridge.c:2309
#28 0x00007fe4492245f5 in fuse_resolve_done (state=<value optimized out>) at
fuse-resolve.c:644
#29 fuse_resolve_all (state=<value optimized out>) at fuse-resolve.c:671
#30 0x00007fe449224326 in fuse_resolve (state=0x7fe405cc39d0) at
fuse-resolve.c:635
#31 0x00007fe44922463e in fuse_resolve_all (state=<value optimized out>) at
fuse-resolve.c:667
#32 0x00007fe4492246a3 in fuse_resolve_continue (state=0x7fe405cc39d0) at
fuse-resolve.c:687
#33 0x00007fe4492242ce in fuse_resolve_fd (state=0x7fe405cc39d0) at
fuse-resolve.c:547
#34 fuse_resolve (state=0x7fe405cc39d0) at fuse-resolve.c:624
#35 0x00007fe44922461e in fuse_resolve_all (state=<value optimized out>) at
fuse-resolve.c:660
#36 0x00007fe449224668 in fuse_resolve_and_resume (state=0x7fe405cc39d0,
fn=0x7fe449231170 <fuse_write_resume>) at fuse-resolve.c:699
---Type <return> to continue, or q <return> to quit---
#37 0x00007fe449239240 in fuse_thread_proc (data=0x7fe452d9e600) at
fuse-bridge.c:4903
#38 0x00007fe450ff1a51 in start_thread () from /lib64/libpthread.so.0
#39 0x00007fe45095b96d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7fe44881a700 (LWP 15709)):
#0  0x00007fe450ff9535 in sigwait () from /lib64/libpthread.so.0
#1  0x00007fe4523d002b in glusterfs_sigwaiter ()
#2  0x00007fe450ff1a51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fe45095b96d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7fe4523b3740 (LWP 15707)):
#0  0x00007fe450ff22ad in pthread_join () from /lib64/libpthread.so.0
#1  0x00007fe451f6a53d in event_dispatch_epoll (event_pool=0x7fe452d8fc90) at
event-epoll.c:762
#2  0x00007fe4523d1ef1 in main ()

Thread 6 (Thread 0x7fe4375fe700 (LWP 15719)):
#0  0x00007fe450ff87dd in read () from /lib64/libpthread.so.0
#1  0x00007fe44922c9b3 in read (data=<value optimized out>) at
/usr/include/bits/unistd.h:45
#2  notify_kernel_loop (data=<value optimized out>) at fuse-bridge.c:3824
#3  0x00007fe450ff1a51 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fe45095b96d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7fe43e903700 (LWP 15715)):
#0  0x00007fe45095bf63 in epoll_wait () from /lib64/libc.so.6
#1  0x00007fe451f6a8c1 in event_dispatch_epoll_worker (data=0x7fe440028e40) at
event-epoll.c:668
#2  0x00007fe450ff1a51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fe45095b96d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7fe447418700 (LWP 15711)):
#0  0x00007fe450ff5a0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007fe451f4dd6b in syncenv_task (proc=0x7fe452db1a50) at syncop.c:607
#2  0x00007fe451f52b80 in syncenv_processor (thdata=0x7fe452db1a50) at
syncop.c:699
#3  0x00007fe450ff1a51 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fe45095b96d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7fe447e19700 (LWP 15710)):
#0  0x00007fe450ff5a0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007fe451f4dd6b in syncenv_task (proc=0x7fe452db1690) at syncop.c:607
#2  0x00007fe451f52b80 in syncenv_processor (thdata=0x7fe452db1690) at
syncop.c:699
#3  0x00007fe450ff1a51 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fe45095b96d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7fe444fdf700 (LWP 15714)):
#0  0x00007fe45095bf63 in epoll_wait () from /lib64/libc.so.6
#1  0x00007fe451f6a8c1 in event_dispatch_epoll_worker (data=0x7fe452de4c00) at
event-epoll.c:668
#2  0x00007fe450ff1a51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fe45095b96d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7fe44921b700 (LWP 15708)):
#0  ec_unlock_timer_del (link=0x7fe43c231c2c) at ec-common.c:1696
#1  0x00007fe451f28713 in gf_timer_proc (ctx=0x7fe452d71010) at timer.c:194
#2  0x00007fe450ff1a51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fe45095b96d in clone () from /lib64/libc.so.6
(gdb)

--- Additional comment from Anand Avati on 2015-07-15 03:27:30 CEST ---

REVIEW: http://review.gluster.org/6459 (timer: fix race between
gf_timer_call_cancel() and gf_timer_proc()) posted (#2) for review on master by
Pranith Kumar Karampuri (pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-07-15 03:27:33 CEST ---

REVIEW: http://review.gluster.org/11670 (cluster/ec: Handle race between
unlock-timer, new lock) posted (#1) for review on master by Pranith Kumar
Karampuri (pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-07-16 04:14:59 CEST ---

REVIEW: http://review.gluster.org/6459 (timer: fix race between
gf_timer_call_cancel() and gf_timer_proc()) posted (#4) for review on master by
Pranith Kumar Karampuri (pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2015-07-23 15:31:00 CEST ---

COMMIT: http://review.gluster.org/11670 committed in master by Xavier Hernandez
(xhernandez at datalab.es) 
------
commit 332cb6a6901c68f0c79070d7103cc7a728ea6d26
Author: Pranith Kumar K <pkarampu at redhat.com>
Date:   Wed Jul 15 06:16:54 2015 +0530

    cluster/ec: Handle race between unlock-timer, new lock

    Problem:
    New lock could come at the time timer is on the way to unlock. This was
leading
    to crash in timer thread because thread executing new lock can free up the
    timer_link->fop and then timer thread will try to access structures already
    freed.

    Fix:
    If the timer event is fired, set lock->release to true and wait for unlock
to
    complete.

    Thanks to Xavi and Bhaskar for helping in confirming that this race is the
RC.
    Thanks to Kritika for pointing out and explaining how Avati's patch can be
used
    to fix this bug.

    Change-Id: I45fa5470bbc1f03b5f3d133e26d1e0ab24303378
    BUG: 1243187
    Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
    Reviewed-on: http://review.gluster.org/11670
    Tested-by: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Xavier Hernandez <xhernandez at datalab.es>
    Tested-by: NetBSD Build System <jenkins at build.gluster.org>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1223636
[Bug 1223636] 3.1 QE Tracker
https://bugzilla.redhat.com/show_bug.cgi?id=1242423
[Bug 1242423] Disperse volume : client glusterfs crashed while running IO
https://bugzilla.redhat.com/show_bug.cgi?id=1243187
[Bug 1243187] Disperse volume : client glusterfs crashed while running IO
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list