[Gluster-devel] Possible problem introduced by http://review.gluster.org/15573

Niels de Vos ndevos at redhat.com
Fri Oct 21 08:03:26 UTC 2016


On Fri, Oct 21, 2016 at 09:03:30AM +0200, Xavier Hernandez wrote:
> Hi,
> 
> I've just tried Gluster 3.8.5 with Proxmox using gfapi and I consistently
> see a crash each time an attempt to connect to the volume is made.

Thanks, that likely is the same bug as
https://bugzilla.redhat.com/1379241 . 

Satheesaran, could you revert commit 7a50690 from the build that you
were testing, and see if that causes the problem to go away again? Let
me know of you want me to provide RPMs for testing.

Niels

> 
> The backtrace of the crash shows this:
> 
> #0  pthread_spin_lock () at ../nptl/sysdeps/x86_64/pthread_spin_lock.S:24
> #1  0x00007fe5345776a5 in fd_unref (fd=0x7fe523f7205c) at fd.c:553
> #2  0x00007fe53482ba18 in glfs_io_async_cbk (op_ret=<optimized out>,
> op_errno=0, frame=<optimized out>, cookie=0x7fe526c67040,
> iovec=iovec at entry=0x0, count=count at entry=0)
>     at glfs-fops.c:839
> #3  0x00007fe53482beed in glfs_fsync_async_cbk (frame=<optimized out>,
> cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>,
> op_errno=<optimized out>,
>     prebuf=<optimized out>, postbuf=0x7fe5217fe890, xdata=0x0) at
> glfs-fops.c:1382
> #4  0x00007fe520be2eb7 in ?? () from
> /usr/lib/x86_64-linux-gnu/glusterfs/3.8.5/xlator/debug/io-stats.so
> #5  0x00007fe5345d118a in default_fsync_cbk (frame=0x7fe52ceef3ac,
> cookie=0x560ef95398e8, this=0x8, op_ret=0, op_errno=0, prebuf=0x1,
> postbuf=0x7fe5217fe890, xdata=0x0) at defaults.c:1508
> #6  0x00007fe5345d118a in default_fsync_cbk (frame=0x7fe52ceef204,
> cookie=0x560ef95398e8, this=0x8, op_ret=0, op_errno=0, prebuf=0x1,
> postbuf=0x7fe5217fe890, xdata=0x0) at defaults.c:1508
> #7  0x00007fe525f78219 in dht_fsync_cbk (frame=0x7fe52ceef2d8,
> cookie=0x560ef95398e8, this=0x0, op_ret=0, op_errno=0,
> prebuf=0x7fe5217fe820, postbuf=0x7fe5217fe890, xdata=0x0)
>     at dht-inode-read.c:873
> #8  0x00007fe5261bbc7f in client3_3_fsync_cbk (req=0x7fe525f78030
> <dht_fsync_cbk>, iov=0x7fe526c61040, count=8, myframe=0x7fe52ceef130) at
> client-rpc-fops.c:975
> #9  0x00007fe5343201f0 in rpc_clnt_handle_reply (clnt=0x18,
> clnt at entry=0x7fe526fafac0, pollin=0x7fe526c3a1c0) at rpc-clnt.c:791
> #10 0x00007fe53432056c in rpc_clnt_notify (trans=<optimized out>,
> mydata=0x7fe526fafaf0, event=<optimized out>, data=0x7fe526c3a1c0) at
> rpc-clnt.c:962
> #11 0x00007fe53431c8a3 in rpc_transport_notify (this=<optimized out>,
> event=<optimized out>, data=<optimized out>) at rpc-transport.c:541
> #12 0x00007fe5283e8d96 in socket_event_poll_in (this=0x7fe526c69440) at
> socket.c:2267
> #13 0x00007fe5283eaf37 in socket_event_handler (fd=<optimized out>, idx=5,
> data=0x7fe526c69440, poll_in=1, poll_out=0, poll_err=0) at socket.c:2397
> #14 0x00007fe5345ab3f6 in event_dispatch_epoll_handler
> (event=0x7fe5217fecc0, event_pool=0x7fe526ca2040) at event-epoll.c:571
> #15 event_dispatch_epoll_worker (data=0x7fe527c0f0c0) at event-epoll.c:674
> #16 0x00007fe5324140a4 in start_thread (arg=0x7fe5217ff700) at
> pthread_create.c:309
> #17 0x00007fe53214962d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> 
> The fd being unreferenced contains this:
> 
> (gdb) print *fd
> $6 = {
>   pid = 97649,
>   flags = 2,
>   refcount = 0,
>   inode_list = {
>     next = 0x7fe523f7206c,
>     prev = 0x7fe523f7206c
>   },
>   inode = 0x0,
>   lock = {
>     spinlock = 1,
>     mutex = {
>       __data = {
>         __lock = 1,
>         __count = 0,
>         __owner = 0,
>         __nusers = 0,
>         __kind = 0,
>         __spins = 0,
>         __elision = 0,
>         __list = {
>           __prev = 0x0,
>           __next = 0x0
>         }
>       },
>       __size = "\001", '\000' <repeats 38 times>,
>       __align = 1
>     }
>   },
>   _ctx = 0x7fe52ec31c40,
>   xl_count = 11,
>   lk_ctx = 0x7fe526c126a0,
>   anonymous = _gf_false
> }
> 
> fd->inode is NULL, explaining the cause of the crash. We also see that
> fd->refcount is already 0. So I'm wondering if this couldn't be an extra
> fd_unref() introduced by that patch.
> 
> The crash seems to happen immediately after a graph switch.
> 
> Xavi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20161021/84836715/attachment.sig>


More information about the Gluster-devel mailing list