[Gluster-devel] Possible problem introduced by http://review.gluster.org/15573

Fri Nov 4 07:36:45 UTC 2016

On Mon, Oct 24, 2016 at 12:39 PM, Xavier Hernandez <xhernandez at datalab.es>
wrote:

> Hi Soumya,
>
>
> On 21/10/16 16:15, Soumya Koduri wrote:
>
>>
>>
>> On 10/21/2016 06:35 PM, Soumya Koduri wrote:
>>
>>> Hi Xavi,
>>>
>>> On 10/21/2016 12:57 PM, Xavier Hernandez wrote:
>>>
>>>> Looking at the code, I think that the added fd_unref() should only be
>>>> called if the fop preparation fails. Otherwise the callback already
>>>> unreferences the fd.
>>>>
>>>> Code flow:
>>>>
>>>> * glfs_fsync_async_common() takes an fd ref and calls STACK_WIND passing
>>>> that fd.
>>>> * Just after that a ref is released.
>>>> * When glfs_io_async_cbk() is called another ref is released.
>>>>
>>>> Note that if fop preparation fails, a single fd_unref() is called, but
>>>> on success two fd_unref() are called.
>>>>
>>>
>>> Sorry for the inconvenience caused. I think its patch#15573 hasn't
>>> caused the problem but has highlighted another ref leak in the code.
>>>
>>> From the code I see that glfs_io_async_cbk() does fd_unref (glfd->fd)
>>> but not the fd passed in STACK_WIND_COOKIE() of the fop.
>>>
>>> If I take any fop, for eg.,
>>> glfs_fsync_common() {
>>>
>>>        fd = glfs_resolve_fd (glfd->fs, subvol, glfd);
>>>
>>>
>>> }
>>>
>>> Here in glfs_resolve_fd ()
>>>
>>> fd_t *
>>> __glfs_resolve_fd (struct glfs *fs, xlator_t *subvol, struct glfs_fd
>>> *glfd)
>>> {
>>>         fd_t *fd = NULL;
>>>
>>>         if (glfd->fd->inode->table->xl == subvol)
>>>                 return fd_ref (glfd->fd);
>>>
>>> Here we can see that  we are taking extra ref additional to the
>>>>>>>>>>>
>>>>>>>>>> ref already taken for glfd->fd. That means the caller of this
>>> function
>>> needs to fd_unref(fd) irrespective of subsequent fd_unref (glfd->fd).
>>>
>>>         fd = __glfs_migrate_fd (fs, subvol, glfd);
>>>         if (!fd)
>>>                 return NULL;
>>>
>>>
>>>         if (subvol == fs->active_subvol) {
>>>                 fd_unref (glfd->fd);
>>>                 glfd->fd = fd_ref (fd);
>>>         }
>>>
>>> I think the issue is here during graph_switch(). You have
>>>>>>>>>>>
>>>>>>>>>> mentioned as well that the crash happens post graph_switch. Maybe
>>> here
>>> we are missing an extra ref to be taken for fd additional to glfd->fd. I
>>> need to look through __glfs_migrate_fd() to confirm that. But these are
>>> my initial thoughts.
>>>
>>
>> Looking into this, I think we should fix glfs_io_async_cbk() not to
>> fd_unref(glfd->fd). glfd->fd should be active though out the lifetime of
>> glfd (i.e, until it is closed). Thoughts?
>>
>
> I don't know gfapi internals in deep, but at first sight I think this
> would be the right think to do. Assuming that glfd will keep a reference to
> the fd until it's destroyed, and that a glfd reference is taken during the
> lifetime of each request that needs it, the fd_unref() in
> glfd_io_async_cbk() seems unnecessary. I think it was there just to release
> the fd acquired if glfs_resolve_fd(), but it's better to place it where
> it's now.
>
> Another question is if we really need to take an additional reference in
> glfs_resolve_fd() ?
>

This answers the first question too, we don't need the additional ref in
glfs_resolve_fd() now that we have ref accounting in glfd. This confusion
is because earlier there was no ref accounting for glfd and only refs were
on fd_t.
No, we don't need this additional reference.

>
> Can an fd returned by this function live more time than the associated
> glfd in some circumstances ?
>
> Also could you please check if it is the second/subsequent fsync_async()
>> call which results in crash?
>>
>
> I'll try to test it as soon as possible, but this is on a server that we
> need to put in production very soon and we have decided to go with fuse for
> now. We'll have a lot of work to do this week. Once I have some free time
> I'll build a test environment to check it, probably next week.
>
>
I have not been able to test this out completely. Theoretically, I don't
see any possibility where fd can outlive a glfd that is pointing to it.
I have sent a patch[1] that
a. fixes the crash
b. handles the unref in failure cases
c. but still has the duplicate refs with some explanation in commit message.
d. adds a simple test for the same.

I request that we take this patch in and make a release soon as it is
affecting many community users and do the cleanup in another patch.

[1] http://review.gluster.org/#/c/15768/

Thanks,
Raghavendra Talur

Xavi
>
>
>
>> Thanks,
>> Soumya
>>
>>
>>> Please let me know your comments.
>>>
>>> Thanks,
>>> Soumya
>>>
>>>
>>>
>>>> Xavi
>>>>
>>>> On 21/10/16 09:03, Xavier Hernandez wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I've just tried Gluster 3.8.5 with Proxmox using gfapi and I
>>>>> consistently see a crash each time an attempt to connect to the volume
>>>>> is made.
>>>>>
>>>>> The backtrace of the crash shows this:
>>>>>
>>>>> #0  pthread_spin_lock () at
>>>>> ../nptl/sysdeps/x86_64/pthread_spin_lock.S:24
>>>>> #1  0x00007fe5345776a5 in fd_unref (fd=0x7fe523f7205c) at fd.c:553
>>>>> #2  0x00007fe53482ba18 in glfs_io_async_cbk (op_ret=<optimized out>,
>>>>> op_errno=0, frame=<optimized out>, cookie=0x7fe526c67040,
>>>>> iovec=iovec at entry=0x0, count=count at entry=0)
>>>>>     at glfs-fops.c:839
>>>>> #3  0x00007fe53482beed in glfs_fsync_async_cbk (frame=<optimized out>,
>>>>> cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>,
>>>>> op_errno=<optimized out>,
>>>>>     prebuf=<optimized out>, postbuf=0x7fe5217fe890, xdata=0x0) at
>>>>> glfs-fops.c:1382
>>>>> #4  0x00007fe520be2eb7 in ?? () from
>>>>> /usr/lib/x86_64-linux-gnu/glusterfs/3.8.5/xlator/debug/io-stats.so
>>>>> #5  0x00007fe5345d118a in default_fsync_cbk (frame=0x7fe52ceef3ac,
>>>>> cookie=0x560ef95398e8, this=0x8, op_ret=0, op_errno=0, prebuf=0x1,
>>>>> postbuf=0x7fe5217fe890, xdata=0x0) at defaults.c:1508
>>>>> #6  0x00007fe5345d118a in default_fsync_cbk (frame=0x7fe52ceef204,
>>>>> cookie=0x560ef95398e8, this=0x8, op_ret=0, op_errno=0, prebuf=0x1,
>>>>> postbuf=0x7fe5217fe890, xdata=0x0) at defaults.c:1508
>>>>> #7  0x00007fe525f78219 in dht_fsync_cbk (frame=0x7fe52ceef2d8,
>>>>> cookie=0x560ef95398e8, this=0x0, op_ret=0, op_errno=0,
>>>>> prebuf=0x7fe5217fe820, postbuf=0x7fe5217fe890, xdata=0x0)
>>>>>     at dht-inode-read.c:873
>>>>> #8  0x00007fe5261bbc7f in client3_3_fsync_cbk (req=0x7fe525f78030
>>>>> <dht_fsync_cbk>, iov=0x7fe526c61040, count=8,
>>>>> myframe=0x7fe52ceef130) at
>>>>> client-rpc-fops.c:975
>>>>> #9  0x00007fe5343201f0 in rpc_clnt_handle_reply (clnt=0x18,
>>>>> clnt at entry=0x7fe526fafac0, pollin=0x7fe526c3a1c0) at rpc-clnt.c:791
>>>>> #10 0x00007fe53432056c in rpc_clnt_notify (trans=<optimized out>,
>>>>> mydata=0x7fe526fafaf0, event=<optimized out>, data=0x7fe526c3a1c0) at
>>>>> rpc-clnt.c:962
>>>>> #11 0x00007fe53431c8a3 in rpc_transport_notify (this=<optimized out>,
>>>>> event=<optimized out>, data=<optimized out>) at rpc-transport.c:541
>>>>> #12 0x00007fe5283e8d96 in socket_event_poll_in (this=0x7fe526c69440) at
>>>>> socket.c:2267
>>>>> #13 0x00007fe5283eaf37 in socket_event_handler (fd=<optimized out>,
>>>>> idx=5, data=0x7fe526c69440, poll_in=1, poll_out=0, poll_err=0) at
>>>>> socket.c:2397
>>>>> #14 0x00007fe5345ab3f6 in event_dispatch_epoll_handler
>>>>> (event=0x7fe5217fecc0, event_pool=0x7fe526ca2040) at event-epoll.c:571
>>>>> #15 event_dispatch_epoll_worker (data=0x7fe527c0f0c0) at
>>>>> event-epoll.c:674
>>>>> #16 0x00007fe5324140a4 in start_thread (arg=0x7fe5217ff700) at
>>>>> pthread_create.c:309
>>>>> #17 0x00007fe53214962d in clone () at
>>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>>>>>
>>>>> The fd being unreferenced contains this:
>>>>>
>>>>> (gdb) print *fd
>>>>> $6 = {
>>>>>   pid = 97649,
>>>>>   flags = 2,
>>>>>   refcount = 0,
>>>>>   inode_list = {
>>>>>     next = 0x7fe523f7206c,
>>>>>     prev = 0x7fe523f7206c
>>>>>   },
>>>>>   inode = 0x0,
>>>>>   lock = {
>>>>>     spinlock = 1,
>>>>>     mutex = {
>>>>>       __data = {
>>>>>         __lock = 1,
>>>>>         __count = 0,
>>>>>         __owner = 0,
>>>>>         __nusers = 0,
>>>>>         __kind = 0,
>>>>>         __spins = 0,
>>>>>         __elision = 0,
>>>>>         __list = {
>>>>>           __prev = 0x0,
>>>>>           __next = 0x0
>>>>>         }
>>>>>       },
>>>>>       __size = "\001", '\000' <repeats 38 times>,
>>>>>       __align = 1
>>>>>     }
>>>>>   },
>>>>>   _ctx = 0x7fe52ec31c40,
>>>>>   xl_count = 11,
>>>>>   lk_ctx = 0x7fe526c126a0,
>>>>>   anonymous = _gf_false
>>>>> }
>>>>>
>>>>> fd->inode is NULL, explaining the cause of the crash. We also see that
>>>>> fd->refcount is already 0. So I'm wondering if this couldn't be an
>>>>> extra
>>>>> fd_unref() introduced by that patch.
>>>>>
>>>>> The crash seems to happen immediately after a graph switch.
>>>>>
>>>>> Xavi
>>>>> _______________________________________________
>>>>> Gluster-devel mailing list
>>>>> Gluster-devel at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>
>>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20161104/ba9f2585/attachment-0001.html>