[Gluster-devel] Possible problem introduced by http://review.gluster.org/15573

Fri Oct 21 13:12:28 UTC 2016

On Fri, Oct 21, 2016 at 6:36 PM, Soumya Koduri <skoduri at redhat.com> wrote:

>
>
> On 10/21/2016 02:03 PM, Xavier Hernandez wrote:
>
>> Hi Niels,
>>
>> On 21/10/16 10:03, Niels de Vos wrote:
>>
>>> On Fri, Oct 21, 2016 at 09:03:30AM +0200, Xavier Hernandez wrote:
>>>
>>>> Hi,
>>>>
>>>> I've just tried Gluster 3.8.5 with Proxmox using gfapi and I
>>>> consistently
>>>> see a crash each time an attempt to connect to the volume is made.
>>>>
>>>
>>> Thanks, that likely is the same bug as
>>> https://bugzilla.redhat.com/1379241 .
>>>
>>
>> I'm not sure it's the same problem. The crash on my case happens always
>> and immediately. When creating an image, the file is created but size is
>> 0. The stack trace is quite different also.
>>
>
> Right. The issue reported in sug1379241 looks like the one we hit with
> client-io-threads enabled (already discussed in gluster-devel). Disabling
> that option may prevent the crash seen.
>

Pranith has sent a fix http://review.gluster.org/#/c/15620/ for the same.

>
> Thanks,
> Soumya
>
>
>
>> Xavi
>>
>>
>>> Satheesaran, could you revert commit 7a50690 from the build that you
>>> were testing, and see if that causes the problem to go away again? Let
>>> me know of you want me to provide RPMs for testing.
>>>
>>> Niels
>>>
>>>
>>>> The backtrace of the crash shows this:
>>>>
>>>> #0  pthread_spin_lock () at
>>>> ../nptl/sysdeps/x86_64/pthread_spin_lock.S:24
>>>> #1  0x00007fe5345776a5 in fd_unref (fd=0x7fe523f7205c) at fd.c:553
>>>> #2  0x00007fe53482ba18 in glfs_io_async_cbk (op_ret=<optimized out>,
>>>> op_errno=0, frame=<optimized out>, cookie=0x7fe526c67040,
>>>> iovec=iovec at entry=0x0, count=count at entry=0)
>>>>     at glfs-fops.c:839
>>>> #3  0x00007fe53482beed in glfs_fsync_async_cbk (frame=<optimized out>,
>>>> cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>,
>>>> op_errno=<optimized out>,
>>>>     prebuf=<optimized out>, postbuf=0x7fe5217fe890, xdata=0x0) at
>>>> glfs-fops.c:1382
>>>> #4  0x00007fe520be2eb7 in ?? () from
>>>> /usr/lib/x86_64-linux-gnu/glusterfs/3.8.5/xlator/debug/io-stats.so
>>>> #5  0x00007fe5345d118a in default_fsync_cbk (frame=0x7fe52ceef3ac,
>>>> cookie=0x560ef95398e8, this=0x8, op_ret=0, op_errno=0, prebuf=0x1,
>>>> postbuf=0x7fe5217fe890, xdata=0x0) at defaults.c:1508
>>>> #6  0x00007fe5345d118a in default_fsync_cbk (frame=0x7fe52ceef204,
>>>> cookie=0x560ef95398e8, this=0x8, op_ret=0, op_errno=0, prebuf=0x1,
>>>> postbuf=0x7fe5217fe890, xdata=0x0) at defaults.c:1508
>>>> #7  0x00007fe525f78219 in dht_fsync_cbk (frame=0x7fe52ceef2d8,
>>>> cookie=0x560ef95398e8, this=0x0, op_ret=0, op_errno=0,
>>>> prebuf=0x7fe5217fe820, postbuf=0x7fe5217fe890, xdata=0x0)
>>>>     at dht-inode-read.c:873
>>>> #8  0x00007fe5261bbc7f in client3_3_fsync_cbk (req=0x7fe525f78030
>>>> <dht_fsync_cbk>, iov=0x7fe526c61040, count=8, myframe=0x7fe52ceef130) at
>>>> client-rpc-fops.c:975
>>>> #9  0x00007fe5343201f0 in rpc_clnt_handle_reply (clnt=0x18,
>>>> clnt at entry=0x7fe526fafac0, pollin=0x7fe526c3a1c0) at rpc-clnt.c:791
>>>> #10 0x00007fe53432056c in rpc_clnt_notify (trans=<optimized out>,
>>>> mydata=0x7fe526fafaf0, event=<optimized out>, data=0x7fe526c3a1c0) at
>>>> rpc-clnt.c:962
>>>> #11 0x00007fe53431c8a3 in rpc_transport_notify (this=<optimized out>,
>>>> event=<optimized out>, data=<optimized out>) at rpc-transport.c:541
>>>> #12 0x00007fe5283e8d96 in socket_event_poll_in (this=0x7fe526c69440) at
>>>> socket.c:2267
>>>> #13 0x00007fe5283eaf37 in socket_event_handler (fd=<optimized out>,
>>>> idx=5,
>>>> data=0x7fe526c69440, poll_in=1, poll_out=0, poll_err=0) at socket.c:2397
>>>> #14 0x00007fe5345ab3f6 in event_dispatch_epoll_handler
>>>> (event=0x7fe5217fecc0, event_pool=0x7fe526ca2040) at event-epoll.c:571
>>>> #15 event_dispatch_epoll_worker (data=0x7fe527c0f0c0) at
>>>> event-epoll.c:674
>>>> #16 0x00007fe5324140a4 in start_thread (arg=0x7fe5217ff700) at
>>>> pthread_create.c:309
>>>> #17 0x00007fe53214962d in clone () at
>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>>>>
>>>> The fd being unreferenced contains this:
>>>>
>>>> (gdb) print *fd
>>>> $6 = {
>>>>   pid = 97649,
>>>>   flags = 2,
>>>>   refcount = 0,
>>>>   inode_list = {
>>>>     next = 0x7fe523f7206c,
>>>>     prev = 0x7fe523f7206c
>>>>   },
>>>>   inode = 0x0,
>>>>   lock = {
>>>>     spinlock = 1,
>>>>     mutex = {
>>>>       __data = {
>>>>         __lock = 1,
>>>>         __count = 0,
>>>>         __owner = 0,
>>>>         __nusers = 0,
>>>>         __kind = 0,
>>>>         __spins = 0,
>>>>         __elision = 0,
>>>>         __list = {
>>>>           __prev = 0x0,
>>>>           __next = 0x0
>>>>         }
>>>>       },
>>>>       __size = "\001", '\000' <repeats 38 times>,
>>>>       __align = 1
>>>>     }
>>>>   },
>>>>   _ctx = 0x7fe52ec31c40,
>>>>   xl_count = 11,
>>>>   lk_ctx = 0x7fe526c126a0,
>>>>   anonymous = _gf_false
>>>> }
>>>>
>>>> fd->inode is NULL, explaining the cause of the crash. We also see that
>>>> fd->refcount is already 0. So I'm wondering if this couldn't be an extra
>>>> fd_unref() introduced by that patch.
>>>>
>>>> The crash seems to happen immediately after a graph switch.
>>>>
>>>> Xavi
>>>>
>>> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>

-- 

~ Atin (atinm)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20161021/fc52a764/attachment-0001.html>