[Gluster-devel] Possible problem introduced by http://review.gluster.org/15573
Xavier Hernandez
xhernandez at datalab.es
Mon Oct 24 07:09:17 UTC 2016
Hi Soumya,
On 21/10/16 16:15, Soumya Koduri wrote:
>
>
> On 10/21/2016 06:35 PM, Soumya Koduri wrote:
>> Hi Xavi,
>>
>> On 10/21/2016 12:57 PM, Xavier Hernandez wrote:
>>> Looking at the code, I think that the added fd_unref() should only be
>>> called if the fop preparation fails. Otherwise the callback already
>>> unreferences the fd.
>>>
>>> Code flow:
>>>
>>> * glfs_fsync_async_common() takes an fd ref and calls STACK_WIND passing
>>> that fd.
>>> * Just after that a ref is released.
>>> * When glfs_io_async_cbk() is called another ref is released.
>>>
>>> Note that if fop preparation fails, a single fd_unref() is called, but
>>> on success two fd_unref() are called.
>>
>> Sorry for the inconvenience caused. I think its patch#15573 hasn't
>> caused the problem but has highlighted another ref leak in the code.
>>
>> From the code I see that glfs_io_async_cbk() does fd_unref (glfd->fd)
>> but not the fd passed in STACK_WIND_COOKIE() of the fop.
>>
>> If I take any fop, for eg.,
>> glfs_fsync_common() {
>>
>> fd = glfs_resolve_fd (glfd->fs, subvol, glfd);
>>
>>
>> }
>>
>> Here in glfs_resolve_fd ()
>>
>> fd_t *
>> __glfs_resolve_fd (struct glfs *fs, xlator_t *subvol, struct glfs_fd
>> *glfd)
>> {
>> fd_t *fd = NULL;
>>
>> if (glfd->fd->inode->table->xl == subvol)
>> return fd_ref (glfd->fd);
>>
>>>>>>>>>> Here we can see that we are taking extra ref additional to the
>> ref already taken for glfd->fd. That means the caller of this function
>> needs to fd_unref(fd) irrespective of subsequent fd_unref (glfd->fd).
>>
>> fd = __glfs_migrate_fd (fs, subvol, glfd);
>> if (!fd)
>> return NULL;
>>
>>
>> if (subvol == fs->active_subvol) {
>> fd_unref (glfd->fd);
>> glfd->fd = fd_ref (fd);
>> }
>>
>>>>>>>>>> I think the issue is here during graph_switch(). You have
>> mentioned as well that the crash happens post graph_switch. Maybe here
>> we are missing an extra ref to be taken for fd additional to glfd->fd. I
>> need to look through __glfs_migrate_fd() to confirm that. But these are
>> my initial thoughts.
>
> Looking into this, I think we should fix glfs_io_async_cbk() not to
> fd_unref(glfd->fd). glfd->fd should be active though out the lifetime of
> glfd (i.e, until it is closed). Thoughts?
I don't know gfapi internals in deep, but at first sight I think this
would be the right think to do. Assuming that glfd will keep a reference
to the fd until it's destroyed, and that a glfd reference is taken
during the lifetime of each request that needs it, the fd_unref() in
glfd_io_async_cbk() seems unnecessary. I think it was there just to
release the fd acquired if glfs_resolve_fd(), but it's better to place
it where it's now.
Another question is if we really need to take an additional reference in
glfs_resolve_fd() ?
Can an fd returned by this function live more time than the associated
glfd in some circumstances ?
> Also could you please check if it is the second/subsequent fsync_async()
> call which results in crash?
I'll try to test it as soon as possible, but this is on a server that we
need to put in production very soon and we have decided to go with fuse
for now. We'll have a lot of work to do this week. Once I have some free
time I'll build a test environment to check it, probably next week.
Xavi
>
> Thanks,
> Soumya
>
>>
>> Please let me know your comments.
>>
>> Thanks,
>> Soumya
>>
>>
>>>
>>> Xavi
>>>
>>> On 21/10/16 09:03, Xavier Hernandez wrote:
>>>> Hi,
>>>>
>>>> I've just tried Gluster 3.8.5 with Proxmox using gfapi and I
>>>> consistently see a crash each time an attempt to connect to the volume
>>>> is made.
>>>>
>>>> The backtrace of the crash shows this:
>>>>
>>>> #0 pthread_spin_lock () at
>>>> ../nptl/sysdeps/x86_64/pthread_spin_lock.S:24
>>>> #1 0x00007fe5345776a5 in fd_unref (fd=0x7fe523f7205c) at fd.c:553
>>>> #2 0x00007fe53482ba18 in glfs_io_async_cbk (op_ret=<optimized out>,
>>>> op_errno=0, frame=<optimized out>, cookie=0x7fe526c67040,
>>>> iovec=iovec at entry=0x0, count=count at entry=0)
>>>> at glfs-fops.c:839
>>>> #3 0x00007fe53482beed in glfs_fsync_async_cbk (frame=<optimized out>,
>>>> cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>,
>>>> op_errno=<optimized out>,
>>>> prebuf=<optimized out>, postbuf=0x7fe5217fe890, xdata=0x0) at
>>>> glfs-fops.c:1382
>>>> #4 0x00007fe520be2eb7 in ?? () from
>>>> /usr/lib/x86_64-linux-gnu/glusterfs/3.8.5/xlator/debug/io-stats.so
>>>> #5 0x00007fe5345d118a in default_fsync_cbk (frame=0x7fe52ceef3ac,
>>>> cookie=0x560ef95398e8, this=0x8, op_ret=0, op_errno=0, prebuf=0x1,
>>>> postbuf=0x7fe5217fe890, xdata=0x0) at defaults.c:1508
>>>> #6 0x00007fe5345d118a in default_fsync_cbk (frame=0x7fe52ceef204,
>>>> cookie=0x560ef95398e8, this=0x8, op_ret=0, op_errno=0, prebuf=0x1,
>>>> postbuf=0x7fe5217fe890, xdata=0x0) at defaults.c:1508
>>>> #7 0x00007fe525f78219 in dht_fsync_cbk (frame=0x7fe52ceef2d8,
>>>> cookie=0x560ef95398e8, this=0x0, op_ret=0, op_errno=0,
>>>> prebuf=0x7fe5217fe820, postbuf=0x7fe5217fe890, xdata=0x0)
>>>> at dht-inode-read.c:873
>>>> #8 0x00007fe5261bbc7f in client3_3_fsync_cbk (req=0x7fe525f78030
>>>> <dht_fsync_cbk>, iov=0x7fe526c61040, count=8,
>>>> myframe=0x7fe52ceef130) at
>>>> client-rpc-fops.c:975
>>>> #9 0x00007fe5343201f0 in rpc_clnt_handle_reply (clnt=0x18,
>>>> clnt at entry=0x7fe526fafac0, pollin=0x7fe526c3a1c0) at rpc-clnt.c:791
>>>> #10 0x00007fe53432056c in rpc_clnt_notify (trans=<optimized out>,
>>>> mydata=0x7fe526fafaf0, event=<optimized out>, data=0x7fe526c3a1c0) at
>>>> rpc-clnt.c:962
>>>> #11 0x00007fe53431c8a3 in rpc_transport_notify (this=<optimized out>,
>>>> event=<optimized out>, data=<optimized out>) at rpc-transport.c:541
>>>> #12 0x00007fe5283e8d96 in socket_event_poll_in (this=0x7fe526c69440) at
>>>> socket.c:2267
>>>> #13 0x00007fe5283eaf37 in socket_event_handler (fd=<optimized out>,
>>>> idx=5, data=0x7fe526c69440, poll_in=1, poll_out=0, poll_err=0) at
>>>> socket.c:2397
>>>> #14 0x00007fe5345ab3f6 in event_dispatch_epoll_handler
>>>> (event=0x7fe5217fecc0, event_pool=0x7fe526ca2040) at event-epoll.c:571
>>>> #15 event_dispatch_epoll_worker (data=0x7fe527c0f0c0) at
>>>> event-epoll.c:674
>>>> #16 0x00007fe5324140a4 in start_thread (arg=0x7fe5217ff700) at
>>>> pthread_create.c:309
>>>> #17 0x00007fe53214962d in clone () at
>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>>>>
>>>> The fd being unreferenced contains this:
>>>>
>>>> (gdb) print *fd
>>>> $6 = {
>>>> pid = 97649,
>>>> flags = 2,
>>>> refcount = 0,
>>>> inode_list = {
>>>> next = 0x7fe523f7206c,
>>>> prev = 0x7fe523f7206c
>>>> },
>>>> inode = 0x0,
>>>> lock = {
>>>> spinlock = 1,
>>>> mutex = {
>>>> __data = {
>>>> __lock = 1,
>>>> __count = 0,
>>>> __owner = 0,
>>>> __nusers = 0,
>>>> __kind = 0,
>>>> __spins = 0,
>>>> __elision = 0,
>>>> __list = {
>>>> __prev = 0x0,
>>>> __next = 0x0
>>>> }
>>>> },
>>>> __size = "\001", '\000' <repeats 38 times>,
>>>> __align = 1
>>>> }
>>>> },
>>>> _ctx = 0x7fe52ec31c40,
>>>> xl_count = 11,
>>>> lk_ctx = 0x7fe526c126a0,
>>>> anonymous = _gf_false
>>>> }
>>>>
>>>> fd->inode is NULL, explaining the cause of the crash. We also see that
>>>> fd->refcount is already 0. So I'm wondering if this couldn't be an
>>>> extra
>>>> fd_unref() introduced by that patch.
>>>>
>>>> The crash seems to happen immediately after a graph switch.
>>>>
>>>> Xavi
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list