[Gluster-devel] Possible problem introduced by http://review.gluster.org/15573
Xavier Hernandez
xhernandez at datalab.es
Fri Oct 21 07:27:16 UTC 2016
Looking at the code, I think that the added fd_unref() should only be
called if the fop preparation fails. Otherwise the callback already
unreferences the fd.
Code flow:
* glfs_fsync_async_common() takes an fd ref and calls STACK_WIND passing
that fd.
* Just after that a ref is released.
* When glfs_io_async_cbk() is called another ref is released.
Note that if fop preparation fails, a single fd_unref() is called, but
on success two fd_unref() are called.
Xavi
On 21/10/16 09:03, Xavier Hernandez wrote:
> Hi,
>
> I've just tried Gluster 3.8.5 with Proxmox using gfapi and I
> consistently see a crash each time an attempt to connect to the volume
> is made.
>
> The backtrace of the crash shows this:
>
> #0 pthread_spin_lock () at ../nptl/sysdeps/x86_64/pthread_spin_lock.S:24
> #1 0x00007fe5345776a5 in fd_unref (fd=0x7fe523f7205c) at fd.c:553
> #2 0x00007fe53482ba18 in glfs_io_async_cbk (op_ret=<optimized out>,
> op_errno=0, frame=<optimized out>, cookie=0x7fe526c67040,
> iovec=iovec at entry=0x0, count=count at entry=0)
> at glfs-fops.c:839
> #3 0x00007fe53482beed in glfs_fsync_async_cbk (frame=<optimized out>,
> cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>,
> op_errno=<optimized out>,
> prebuf=<optimized out>, postbuf=0x7fe5217fe890, xdata=0x0) at
> glfs-fops.c:1382
> #4 0x00007fe520be2eb7 in ?? () from
> /usr/lib/x86_64-linux-gnu/glusterfs/3.8.5/xlator/debug/io-stats.so
> #5 0x00007fe5345d118a in default_fsync_cbk (frame=0x7fe52ceef3ac,
> cookie=0x560ef95398e8, this=0x8, op_ret=0, op_errno=0, prebuf=0x1,
> postbuf=0x7fe5217fe890, xdata=0x0) at defaults.c:1508
> #6 0x00007fe5345d118a in default_fsync_cbk (frame=0x7fe52ceef204,
> cookie=0x560ef95398e8, this=0x8, op_ret=0, op_errno=0, prebuf=0x1,
> postbuf=0x7fe5217fe890, xdata=0x0) at defaults.c:1508
> #7 0x00007fe525f78219 in dht_fsync_cbk (frame=0x7fe52ceef2d8,
> cookie=0x560ef95398e8, this=0x0, op_ret=0, op_errno=0,
> prebuf=0x7fe5217fe820, postbuf=0x7fe5217fe890, xdata=0x0)
> at dht-inode-read.c:873
> #8 0x00007fe5261bbc7f in client3_3_fsync_cbk (req=0x7fe525f78030
> <dht_fsync_cbk>, iov=0x7fe526c61040, count=8, myframe=0x7fe52ceef130) at
> client-rpc-fops.c:975
> #9 0x00007fe5343201f0 in rpc_clnt_handle_reply (clnt=0x18,
> clnt at entry=0x7fe526fafac0, pollin=0x7fe526c3a1c0) at rpc-clnt.c:791
> #10 0x00007fe53432056c in rpc_clnt_notify (trans=<optimized out>,
> mydata=0x7fe526fafaf0, event=<optimized out>, data=0x7fe526c3a1c0) at
> rpc-clnt.c:962
> #11 0x00007fe53431c8a3 in rpc_transport_notify (this=<optimized out>,
> event=<optimized out>, data=<optimized out>) at rpc-transport.c:541
> #12 0x00007fe5283e8d96 in socket_event_poll_in (this=0x7fe526c69440) at
> socket.c:2267
> #13 0x00007fe5283eaf37 in socket_event_handler (fd=<optimized out>,
> idx=5, data=0x7fe526c69440, poll_in=1, poll_out=0, poll_err=0) at
> socket.c:2397
> #14 0x00007fe5345ab3f6 in event_dispatch_epoll_handler
> (event=0x7fe5217fecc0, event_pool=0x7fe526ca2040) at event-epoll.c:571
> #15 event_dispatch_epoll_worker (data=0x7fe527c0f0c0) at event-epoll.c:674
> #16 0x00007fe5324140a4 in start_thread (arg=0x7fe5217ff700) at
> pthread_create.c:309
> #17 0x00007fe53214962d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>
> The fd being unreferenced contains this:
>
> (gdb) print *fd
> $6 = {
> pid = 97649,
> flags = 2,
> refcount = 0,
> inode_list = {
> next = 0x7fe523f7206c,
> prev = 0x7fe523f7206c
> },
> inode = 0x0,
> lock = {
> spinlock = 1,
> mutex = {
> __data = {
> __lock = 1,
> __count = 0,
> __owner = 0,
> __nusers = 0,
> __kind = 0,
> __spins = 0,
> __elision = 0,
> __list = {
> __prev = 0x0,
> __next = 0x0
> }
> },
> __size = "\001", '\000' <repeats 38 times>,
> __align = 1
> }
> },
> _ctx = 0x7fe52ec31c40,
> xl_count = 11,
> lk_ctx = 0x7fe526c126a0,
> anonymous = _gf_false
> }
>
> fd->inode is NULL, explaining the cause of the crash. We also see that
> fd->refcount is already 0. So I'm wondering if this couldn't be an extra
> fd_unref() introduced by that patch.
>
> The crash seems to happen immediately after a graph switch.
>
> Xavi
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list