[Gluster-devel] Assertion failed: lru_inode_ctx->block_num > 0

Wed Jan 4 11:24:45 UTC 2017

Hi Krutika,

Happy new year to you!

Regarding this issue, do you have any new update?

Cw

On Fri, Dec 23, 2016 at 1:05 PM, Krutika Dhananjay <kdhananj at redhat.com> wrote:
> Perfect. That's what I needed to know. Thanks! :)
>
> -Krutika
>
> On Fri, Dec 23, 2016 at 7:15 AM, qingwei wei <tchengwee at gmail.com> wrote:
>>
>> Hi Krutika,
>>
>> Is this information you need?
>>
>> [root at localhost /]# gdb /usr/local/sbin/glusterfs core.14019
>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later
>> <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-redhat-linux-gnu".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from /usr/local/sbin/glusterfsd...done.
>>
>> warning: core file may not match specified executable file.
>> [New LWP 14025]
>> [New LWP 14019]
>> [New LWP 14020]
>> [New LWP 14029]
>> [New LWP 14021]
>> [New LWP 14022]
>> [New LWP 14023]
>> [New LWP 14024]
>> [New LWP 14028]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> Core was generated by `/usr/local/sbin/glusterfs
>> --volfile-server=10.217.242.32 --volfile-id=/testSF1'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  list_del_init (old=0x7f5d6c003d10) at
>> ../../../../libglusterfs/src/list.h:87
>> 87        old->prev->next = old->next;
>> (gdb) select-frame 3
>> (gdb) print local->fop
>> $1 = GF_FOP_WRITE
>> (gdb)
>>
>> Hopefully is useful for your investigation.
>>
>> Thanks.
>>
>> Cw
>>
>>
>> On Wed, Dec 21, 2016 at 8:13 PM, Krutika Dhananjay <kdhananj at redhat.com>
>> wrote:
>> > Just one more information I need from you.
>> > Assuming you have the coredump, could you attach it to gdb and print
>> > local->fop and tell me what fop it was when the crash happened?
>> > You'll need to switch to frame 3 in gdb to get the value of this
>> > variable.
>> >
>> > -Krutika
>> >
>> > On Wed, Dec 21, 2016 at 5:35 PM, Krutika Dhananjay <kdhananj at redhat.com>
>> > wrote:
>> >>
>> >> Thanks for this. The information seems sufficient at the moment.
>> >> Will get back to you on this if/when I find something.
>> >>
>> >> -Krutika
>> >>
>> >> On Mon, Dec 19, 2016 at 1:44 PM, qingwei wei <tchengwee at gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi Krutika,
>> >>>
>> >>> Sorry for the delay as i am busy with other works. Attached is the
>> >>> tar.gz file with client and server log, the gfid information on the
>> >>> shard folder (please look at test.0.0 file as the log is captured when
>> >>> i run fio on this file.) and also the print statement i put inside the
>> >>> code. Fyi, i did 2 runs this time and only the second run give me
>> >>> problem. Hope this information helps.
>> >>>
>> >>> Regards,
>> >>>
>> >>> Cw
>> >>>
>> >>> On Thu, Dec 15, 2016 at 8:02 PM, Krutika Dhananjay
>> >>> <kdhananj at redhat.com>
>> >>> wrote:
>> >>> > Good that you asked. I'll try but be warned this will involve me
>> >>> > coming
>> >>> > back
>> >>> > to you with lot more questions. :)
>> >>> >
>> >>> > I've been trying this for the past two days (not to mention the fio
>> >>> > run
>> >>> > takes
>> >>> > really long) and so far there has been no crash/assert failure.
>> >>> >
>> >>> > If you already have the core:
>> >>> > in frame 1,
>> >>> > 0. print block_num
>> >>> > 1. get lru_inode_ctx->stat.ia_gfid
>> >>> > 2. convert it to hex
>> >>> > 3. find the gfid in your backend that corresponds to this gfid and
>> >>> > share its
>> >>> > path in your response
>> >>> > 4. print priv->inode_count
>> >>> > 5. and of course lru_inode_ctx->block_num :)
>> >>> > 6. Also attach the complete brick and client logs.
>> >>> >
>> >>> > -Krutika
>> >>> >
>> >>> >
>> >>> > On Thu, Dec 15, 2016 at 3:18 PM, qingwei wei <tchengwee at gmail.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> Hi Krutika,
>> >>> >>
>> >>> >> Do you need anymore information? Do let me know as i can try on my
>> >>> >> test system. Thanks.
>> >>> >>
>> >>> >> Cw
>> >>> >>
>> >>> >> On Tue, Dec 13, 2016 at 12:17 AM, qingwei wei <tchengwee at gmail.com>
>> >>> >> wrote:
>> >>> >> > Hi Krutika,
>> >>> >> >
>> >>> >> > You mean FIO command?
>> >>> >> >
>> >>> >> > Below is how i do the sequential write. This example i am using
>> >>> >> > 400GB
>> >>> >> > file, for the SHARD_MAX_INODE=16, i use 300MB file.
>> >>> >> >
>> >>> >> > fio -group_reporting -ioengine libaio -directory /mnt/testSF-HDD1
>> >>> >> > -fallocate none -direct 1 -filesize 400g -nrfiles 1 -openfiles 1
>> >>> >> > -bs
>> >>> >> > 256k -numjobs 1 -iodepth 2 -name test -rw write
>> >>> >> >
>> >>> >> > And after FIO complete the above workload, i do the random write
>> >>> >> >
>> >>> >> > fio -group_reporting -ioengine libaio -directory /mnt/testSF-HDD1
>> >>> >> > -fallocate none -direct 1 -filesize 400g -nrfiles 1 -openfiles 1
>> >>> >> > -bs
>> >>> >> > 8k -numjobs 1 -iodepth 2 -name test -rw randwrite
>> >>> >> >
>> >>> >> > The error (Sometimes segmentation fault) only happen during
>> >>> >> > random
>> >>> >> > write.
>> >>> >> >
>> >>> >> > The gluster volume is 3 replica volume with shard enable and 16MB
>> >>> >> > shard block size.
>> >>> >> >
>> >>> >> > Thanks.
>> >>> >> >
>> >>> >> > Cw
>> >>> >> >
>> >>> >> > On Tue, Dec 13, 2016 at 12:00 AM, Krutika Dhananjay
>> >>> >> > <kdhananj at redhat.com> wrote:
>> >>> >> >> I tried but couldn't recreate this issue (even with
>> >>> >> >> SHARD_MAX_INODES
>> >>> >> >> being
>> >>> >> >> 16).
>> >>> >> >> Could you share the exact command you used?
>> >>> >> >>
>> >>> >> >> -Krutika
>> >>> >> >>
>> >>> >> >> On Mon, Dec 12, 2016 at 12:15 PM, qingwei wei
>> >>> >> >> <tchengwee at gmail.com>
>> >>> >> >> wrote:
>> >>> >> >>>
>> >>> >> >>> Hi Krutika,
>> >>> >> >>>
>> >>> >> >>> Thanks. Looking forward to your reply.
>> >>> >> >>>
>> >>> >> >>> Cw
>> >>> >> >>>
>> >>> >> >>> On Mon, Dec 12, 2016 at 2:27 PM, Krutika Dhananjay
>> >>> >> >>> <kdhananj at redhat.com>
>> >>> >> >>> wrote:
>> >>> >> >>> > Hi,
>> >>> >> >>> >
>> >>> >> >>> > First of all, apologies for the late reply. Couldn't find
>> >>> >> >>> > time
>> >>> >> >>> > to
>> >>> >> >>> > look
>> >>> >> >>> > into
>> >>> >> >>> > this
>> >>> >> >>> > until now.
>> >>> >> >>> >
>> >>> >> >>> > Changing SHARD_MAX_INODES value from 12384 to 16 is a cool
>> >>> >> >>> > trick!
>> >>> >> >>> > Let me try that as well and get back to you in some time.
>> >>> >> >>> >
>> >>> >> >>> > -Krutika
>> >>> >> >>> >
>> >>> >> >>> > On Thu, Dec 8, 2016 at 11:07 AM, qingwei wei
>> >>> >> >>> > <tchengwee at gmail.com>
>> >>> >> >>> > wrote:
>> >>> >> >>> >>
>> >>> >> >>> >> Hi,
>> >>> >> >>> >>
>> >>> >> >>> >> With the help from my colleague, we did some changes to the
>> >>> >> >>> >> code
>> >>> >> >>> >> with
>> >>> >> >>> >> reduce number of SHARD_MAX_INODES (from 16384 to 16) and
>> >>> >> >>> >> also
>> >>> >> >>> >> include
>> >>> >> >>> >> the printing of blk_num inside
>> >>> >> >>> >> __shard_update_shards_inode_list. We
>> >>> >> >>> >> then execute fio to first do sequential write of 300MB file.
>> >>> >> >>> >> After
>> >>> >> >>> >> this run completed, we then use fio to generate random write
>> >>> >> >>> >> (8k).
>> >>> >> >>> >> And
>> >>> >> >>> >> during this random write run, we found that there is
>> >>> >> >>> >> situation
>> >>> >> >>> >> where
>> >>> >> >>> >> the blk_num is negative number and this trigger the
>> >>> >> >>> >> following
>> >>> >> >>> >> assertion.
>> >>> >> >>> >>
>> >>> >> >>> >> GF_ASSERT (lru_inode_ctx->block_num > 0);
>> >>> >> >>> >>
>> >>> >> >>> >> [2016-12-08 03:16:34.217582] E
>> >>> >> >>> >> [shard.c:468:__shard_update_shards_inode_list]
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >> (-->/usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so(shard_common_lookup_shards_cbk+0x2d)
>> >>> >> >>> >> [0x7f7300930b6d]
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >> -->/usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so(shard_link_block_inode+0xce)
>> >>> >> >>> >> [0x7f7300930b1e]
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >> -->/usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so(__shard_update_shards_inode_list+0x36b)
>> >>> >> >>> >> [0x7f730092bf5b] ) 0-: Assertion failed:
>> >>> >> >>> >> lru_inode_ctx->block_num >
>> >>> >> >>> >> 0
>> >>> >> >>> >>
>> >>> >> >>> >> Also, there is segmentation fault shortly after this
>> >>> >> >>> >> assertion
>> >>> >> >>> >> and
>> >>> >> >>> >> after that fio exit with error.
>> >>> >> >>> >>
>> >>> >> >>> >> frame : type(0) op(0)
>> >>> >> >>> >> patchset: git://git.gluster.com/glusterfs.git
>> >>> >> >>> >> signal received: 11
>> >>> >> >>> >> time of crash:
>> >>> >> >>> >> 2016-12-08 03:16:34
>> >>> >> >>> >> configuration details:
>> >>> >> >>> >> argp 1
>> >>> >> >>> >> backtrace 1
>> >>> >> >>> >> dlfcn 1
>> >>> >> >>> >> libpthread 1
>> >>> >> >>> >> llistxattr 1
>> >>> >> >>> >> setfsid 1
>> >>> >> >>> >> spinlock 1
>> >>> >> >>> >> epoll.h 1
>> >>> >> >>> >> xattr.h 1
>> >>> >> >>> >> st_atim.tv_nsec 1
>> >>> >> >>> >> package-string: glusterfs 3.7.17
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >> /usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x92)[0x7f730e900332]
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >> /usr/local/lib/libglusterfs.so.0(gf_print_trace+0x2d5)[0x7f730e9250b5]
>> >>> >> >>> >> /lib64/libc.so.6(+0x35670)[0x7f730d1f1670]
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >> /usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so(__shard_update_shards_inode_list+0x1d4)[0x7f730092bdc4]
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >> /usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so(shard_link_block_inode+0xce)[0x7f7300930b1e]
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >> /usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so(shard_common_lookup_shards_cbk+0x2d)[0x7f7300930b6d]
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >> /usr/local/lib/glusterfs/3.7.17/xlator/cluster/distribute.so(dht_lookup_cbk+0x380)[0x7f7300b8e240]
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >> /usr/local/lib/glusterfs/3.7.17/xlator/protocol/client.so(client3_3_lookup_cbk+0x769)[0x7f7300df4989]
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >> /usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f730e6ce010]
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >> /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x1df)[0x7f730e6ce2ef]
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >> /usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f730e6ca483]
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >> /usr/local/lib/glusterfs/3.7.17/rpc-transport/socket.so(+0x6344)[0x7f73034dc344]
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >> /usr/local/lib/glusterfs/3.7.17/rpc-transport/socket.so(+0x8f44)[0x7f73034def44]
>> >>> >> >>> >> /usr/local/lib/libglusterfs.so.0(+0x925aa)[0x7f730e96c5aa]
>> >>> >> >>> >> /lib64/libpthread.so.0(+0x7dc5)[0x7f730d96ddc5]
>> >>> >> >>> >>
>> >>> >> >>> >> Core dump:
>> >>> >> >>> >>
>> >>> >> >>> >> Using host libthread_db library "/lib64/libthread_db.so.1".
>> >>> >> >>> >> Core was generated by `/usr/local/sbin/glusterfs
>> >>> >> >>> >> --volfile-server=10.217.242.32 --volfile-id=/testSF1'.
>> >>> >> >>> >> Program terminated with signal 11, Segmentation fault.
>> >>> >> >>> >> #0  list_del_init (old=0x7f72f4003de0) at
>> >>> >> >>> >> ../../../../libglusterfs/src/list.h:87
>> >>> >> >>> >> 87        old->prev->next = old->next;
>> >>> >> >>> >>
>> >>> >> >>> >> bt
>> >>> >> >>> >>
>> >>> >> >>> >> #0  list_del_init (old=0x7f72f4003de0) at
>> >>> >> >>> >> ../../../../libglusterfs/src/list.h:87
>> >>> >> >>> >> #1  __shard_update_shards_inode_list
>> >>> >> >>> >> (linked_inode=linked_inode at entry=0x7f72fa7a6e48,
>> >>> >> >>> >> this=this at entry=0x7f72fc0090c0, base_inode=0x7f72fa7a5108,
>> >>> >> >>> >>     block_num=block_num at entry=10) at shard.c:469
>> >>> >> >>> >> #2  0x00007f7300930b1e in shard_link_block_inode
>> >>> >> >>> >> (local=local at entry=0x7f730ec4ed00, block_num=10,
>> >>> >> >>> >> inode=<optimized
>> >>> >> >>> >> out>,
>> >>> >> >>> >>     buf=buf at entry=0x7f730180c990) at shard.c:1559
>> >>> >> >>> >> #3  0x00007f7300930b6d in shard_common_lookup_shards_cbk
>> >>> >> >>> >> (frame=0x7f730c611204, cookie=<optimized out>,
>> >>> >> >>> >> this=0x7f72fc0090c0,
>> >>> >> >>> >> op_ret=0,
>> >>> >> >>> >>     op_errno=<optimized out>, inode=<optimized out>,
>> >>> >> >>> >> buf=0x7f730180c990, xdata=0x7f730c029cdc,
>> >>> >> >>> >> postparent=0x7f730180ca00)
>> >>> >> >>> >> at shard.c:1596
>> >>> >> >>> >> #4  0x00007f7300b8e240 in dht_lookup_cbk
>> >>> >> >>> >> (frame=0x7f730c61dc40,
>> >>> >> >>> >> cookie=<optimized out>, this=<optimized out>, op_ret=0,
>> >>> >> >>> >> op_errno=22,
>> >>> >> >>> >>     inode=0x7f72fa7a6e48, stbuf=0x7f730180c990,
>> >>> >> >>> >> xattr=0x7f730c029cdc,
>> >>> >> >>> >> postparent=0x7f730180ca00) at dht-common.c:2362
>> >>> >> >>> >> #5  0x00007f7300df4989 in client3_3_lookup_cbk
>> >>> >> >>> >> (req=<optimized
>> >>> >> >>> >> out>,
>> >>> >> >>> >> iov=<optimized out>, count=<optimized out>,
>> >>> >> >>> >> myframe=0x7f730c616ab4)
>> >>> >> >>> >>     at client-rpc-fops.c:2988
>> >>> >> >>> >> #6  0x00007f730e6ce010 in rpc_clnt_handle_reply
>> >>> >> >>> >> (clnt=clnt at entry=0x7f72fc04c040,
>> >>> >> >>> >> pollin=pollin at entry=0x7f72fc079560)
>> >>> >> >>> >> at rpc-clnt.c:796
>> >>> >> >>> >> #7  0x00007f730e6ce2ef in rpc_clnt_notify (trans=<optimized
>> >>> >> >>> >> out>,
>> >>> >> >>> >> mydata=0x7f72fc04c070, event=<optimized out>,
>> >>> >> >>> >> data=0x7f72fc079560)
>> >>> >> >>> >>     at rpc-clnt.c:967
>> >>> >> >>> >> #8  0x00007f730e6ca483 in rpc_transport_notify
>> >>> >> >>> >> (this=this at entry=0x7f72fc05bd30,
>> >>> >> >>> >> event=event at entry=RPC_TRANSPORT_MSG_RECEIVED,
>> >>> >> >>> >>     data=data at entry=0x7f72fc079560) at rpc-transport.c:546
>> >>> >> >>> >> #9  0x00007f73034dc344 in socket_event_poll_in
>> >>> >> >>> >> (this=this at entry=0x7f72fc05bd30) at socket.c:2250
>> >>> >> >>> >> #10 0x00007f73034def44 in socket_event_handler
>> >>> >> >>> >> (fd=fd at entry=10,
>> >>> >> >>> >> idx=idx at entry=2, data=0x7f72fc05bd30, poll_in=1, poll_out=0,
>> >>> >> >>> >> poll_err=0)
>> >>> >> >>> >>     at socket.c:2363
>> >>> >> >>> >> #11 0x00007f730e96c5aa in event_dispatch_epoll_handler
>> >>> >> >>> >> (event=0x7f730180ced0, event_pool=0xf42ee0) at
>> >>> >> >>> >> event-epoll.c:575
>> >>> >> >>> >> #12 event_dispatch_epoll_worker (data=0xf8d650) at
>> >>> >> >>> >> event-epoll.c:678
>> >>> >> >>> >> #13 0x00007f730d96ddc5 in start_thread () from
>> >>> >> >>> >> /lib64/libpthread.so.0
>> >>> >> >>> >> #14 0x00007f730d2b2ced in clone () from /lib64/libc.so.6
>> >>> >> >>> >>
>> >>> >> >>> >> It seems like there is some situation where the structure is
>> >>> >> >>> >> not
>> >>> >> >>> >> intialized properly? Appreciate if anyone can advice.
>> >>> >> >>> >> Thanks.
>> >>> >> >>> >>
>> >>> >> >>> >> Cw
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >>
>> >>> >> >>> >> On Wed, Dec 7, 2016 at 9:42 AM, qingwei wei
>> >>> >> >>> >> <tchengwee at gmail.com>
>> >>> >> >>> >> wrote:
>> >>> >> >>> >> > Hi,
>> >>> >> >>> >> >
>> >>> >> >>> >> > I did another test and this time FIO fails with
>> >>> >> >>> >> >
>> >>> >> >>> >> > fio: io_u error on file /mnt/testSF-HDD1/test: Invalid
>> >>> >> >>> >> > argument:
>> >>> >> >>> >> > write
>> >>> >> >>> >> > offset=114423242752, buflen=8192
>> >>> >> >>> >> > fio: pid=10052, err=22/file:io_u.c:1582, func=io_u error,
>> >>> >> >>> >> > error=Invalid
>> >>> >> >>> >> > argument
>> >>> >> >>> >> >
>> >>> >> >>> >> > test: (groupid=0, jobs=1): err=22 (file:io_u.c:1582,
>> >>> >> >>> >> > func=io_u
>> >>> >> >>> >> > error,
>> >>> >> >>> >> > error=Invalid argument): pid=10052: Tue Dec  6 15:18:47
>> >>> >> >>> >> > 2016
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> > Below is the client log:
>> >>> >> >>> >> >
>> >>> >> >>> >> > [2016-12-06 05:19:31.261289] I
>> >>> >> >>> >> > [fuse-bridge.c:5171:fuse_graph_setup]
>> >>> >> >>> >> > 0-fuse: switched to graph 0
>> >>> >> >>> >> > [2016-12-06 05:19:31.261355] I [MSGID: 114035]
>> >>> >> >>> >> > [client-handshake.c:193:client_set_lk_version_cbk]
>> >>> >> >>> >> > 0-testSF-HDD-client-5: Server lk version = 1
>> >>> >> >>> >> > [2016-12-06 05:19:31.261404] I
>> >>> >> >>> >> > [fuse-bridge.c:4083:fuse_init]
>> >>> >> >>> >> > 0-glusterfs-fuse: FUSE inited with protocol versions:
>> >>> >> >>> >> > glusterfs
>> >>> >> >>> >> > 7.22
>> >>> >> >>> >> > kernel 7.22
>> >>> >> >>> >> > [2016-12-06 05:19:31.262901] I [MSGID: 108031]
>> >>> >> >>> >> > [afr-common.c:2071:afr_local_discovery_cbk]
>> >>> >> >>> >> > 0-testSF-HDD-replicate-0:
>> >>> >> >>> >> > selecting local read_child testSF-HDD-client-1
>> >>> >> >>> >> > [2016-12-06 05:19:31.262930] I [MSGID: 108031]
>> >>> >> >>> >> > [afr-common.c:2071:afr_local_discovery_cbk]
>> >>> >> >>> >> > 0-testSF-HDD-replicate-0:
>> >>> >> >>> >> > selecting local read_child testSF-HDD-client-0
>> >>> >> >>> >> > [2016-12-06 05:19:31.262948] I [MSGID: 108031]
>> >>> >> >>> >> > [afr-common.c:2071:afr_local_discovery_cbk]
>> >>> >> >>> >> > 0-testSF-HDD-replicate-0:
>> >>> >> >>> >> > selecting local read_child testSF-HDD-client-2
>> >>> >> >>> >> > [2016-12-06 05:19:31.269592] I [MSGID: 108031]
>> >>> >> >>> >> > [afr-common.c:2071:afr_local_discovery_cbk]
>> >>> >> >>> >> > 0-testSF-HDD-replicate-1:
>> >>> >> >>> >> > selecting local read_child testSF-HDD-client-3
>> >>> >> >>> >> > [2016-12-06 05:19:31.269795] I [MSGID: 108031]
>> >>> >> >>> >> > [afr-common.c:2071:afr_local_discovery_cbk]
>> >>> >> >>> >> > 0-testSF-HDD-replicate-1:
>> >>> >> >>> >> > selecting local read_child testSF-HDD-client-4
>> >>> >> >>> >> > [2016-12-06 05:19:31.277763] I [MSGID: 108031]
>> >>> >> >>> >> > [afr-common.c:2071:afr_local_discovery_cbk]
>> >>> >> >>> >> > 0-testSF-HDD-replicate-1:
>> >>> >> >>> >> > selecting local read_child testSF-HDD-client-5
>> >>> >> >>> >> > [2016-12-06 06:58:05.399244] W [MSGID: 101159]
>> >>> >> >>> >> > [inode.c:1219:__inode_unlink] 0-inode:
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> > be318638-e8a0-4c6d-977d-7a937aa84806/864c9ea1-3a7e-4d41-899b-f30604a7584e.16284:
>> >>> >> >>> >> > dentry not found in 63af10b7-9dac-4a53-aab1-3cc17fff3255
>> >>> >> >>> >> > [2016-12-06 15:17:43.311400] E
>> >>> >> >>> >> > [shard.c:460:__shard_update_shards_inode_list]
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> > (-->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(shard_common_lookup_shards_cbk+0x2d)
>> >>> >> >>> >> > [0x7f5575680fdd]
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> > -->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(shard_link_block_inode+0xdf)
>> >>> >> >>> >> > [0x7f5575680f6f]
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> > -->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(__shard_update_shards_inode_list+0x22e)
>> >>> >> >>> >> > [0x7f557567c1ce] ) 0-: Assertion failed:
>> >>> >> >>> >> > lru_inode_ctx->block_num
>> >>> >> >>> >> > > 0
>> >>> >> >>> >> > [2016-12-06 15:17:43.311472] W [inode.c:1232:inode_unlink]
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> > (-->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(shard_link_block_inode+0xdf)
>> >>> >> >>> >> > [0x7f5575680f6f]
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> > -->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(__shard_update_shards_inode_list+0x14a)
>> >>> >> >>> >> > [0x7f557567c0ea]
>> >>> >> >>> >> > -->/lib64/libglusterfs.so.0(inode_unlink+0x9c)
>> >>> >> >>> >> > [0x7f558386ba0c] ) 0-testSF-HDD-shard: inode not found
>> >>> >> >>> >> > [2016-12-06 15:17:43.333456] W [inode.c:1133:inode_forget]
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> > (-->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(shard_link_block_inode+0xdf)
>> >>> >> >>> >> > [0x7f5575680f6f]
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> >
>> >>> >> >>> >> > -->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(__shard_update_shards_inode_list+0x154)
>> >>> >> >>> >> > [0x7f557567c0f4]
>> >>> >> >>> >> > -->/lib64/libglusterfs.so.0(inode_forget+0x90)
>> >>> >> >>> >> > [0x7f558386b800] ) 0-testSF-HDD-shard: inode not found
>> >>> >> >>> >> > [2016-12-06 15:18:47.129794] W
>> >>> >> >>> >> > [fuse-bridge.c:2311:fuse_writev_cbk]
>> >>> >> >>> >> > 0-glusterfs-fuse: 12555429: WRITE => -1
>> >>> >> >>> >> > gfid=864c9ea1-3a7e-4d41-899b-f30604a7584e
>> >>> >> >>> >> > fd=0x7f557016ae6c
>> >>> >> >>> >> > (Invalid
>> >>> >> >>> >> > argument)
>> >>> >> >>> >> >
>> >>> >> >>> >> > Below is the code and it will go to the else block when
>> >>> >> >>> >> > inode_count
>> >>> >> >>> >> > is
>> >>> >> >>> >> > greater than SHARD_MAX_INODES which is 16384. And my
>> >>> >> >>> >> > dataset
>> >>> >> >>> >> > of
>> >>> >> >>> >> > 400GB
>> >>> >> >>> >> > with 16MB shard size has enough shard file (400GB/16MB) to
>> >>> >> >>> >> > achieve
>> >>> >> >>> >> > it.
>> >>> >> >>> >> > When i do the test with smaller dataset, there is no such
>> >>> >> >>> >> > error.
>> >>> >> >>> >> >
>> >>> >> >>> >> > shard.c
>> >>> >> >>> >> >
>> >>> >> >>> >> >                 if (priv->inode_count + 1 <=
>> >>> >> >>> >> > SHARD_MAX_INODES) {
>> >>> >> >>> >> >                 /* If this inode was linked here for the
>> >>> >> >>> >> > first
>> >>> >> >>> >> > time
>> >>> >> >>> >> > (indicated
>> >>> >> >>> >> >                  * by empty list), and if there is still
>> >>> >> >>> >> > space in
>> >>> >> >>> >> > the
>> >>> >> >>> >> > priv list,
>> >>> >> >>> >> >                  * add this ctx to the tail of the list.
>> >>> >> >>> >> >                  */
>> >>> >> >>> >> >                         gf_uuid_copy (ctx->base_gfid,
>> >>> >> >>> >> > base_inode->gfid);
>> >>> >> >>> >> >                         ctx->block_num = block_num;
>> >>> >> >>> >> >                         list_add_tail (&ctx->ilist,
>> >>> >> >>> >> > &priv->ilist_head);
>> >>> >> >>> >> >                         priv->inode_count++;
>> >>> >> >>> >> >                 } else {
>> >>> >> >>> >> >                 /*If on the other hand there is no
>> >>> >> >>> >> > available
>> >>> >> >>> >> > slot
>> >>> >> >>> >> > for
>> >>> >> >>> >> > this inode
>> >>> >> >>> >> >                  * in the list, delete the lru inode from
>> >>> >> >>> >> > the
>> >>> >> >>> >> > head of
>> >>> >> >>> >> > the list,
>> >>> >> >>> >> >                  * unlink it. And in its place add this
>> >>> >> >>> >> > new
>> >>> >> >>> >> > inode
>> >>> >> >>> >> > into
>> >>> >> >>> >> > the list.
>> >>> >> >>> >> >                  */
>> >>> >> >>> >> >                         lru_inode_ctx = list_first_entry
>> >>> >> >>> >> > (&priv->ilist_head,
>> >>> >> >>> >> >
>> >>> >> >>> >> > shard_inode_ctx_t,
>> >>> >> >>> >> >
>> >>> >> >>> >> > ilist);
>> >>> >> >>> >> >                         /* add in message for debug*/
>> >>> >> >>> >> >                         gf_msg (THIS->name,
>> >>> >> >>> >> > GF_LOG_WARNING,
>> >>> >> >>> >> > 0,
>> >>> >> >>> >> > SHARD_MSG_INVALID_FOP,
>> >>> >> >>> >> >                         "block number = %d",
>> >>> >> >>> >> > lru_inode_ctx->block_num);
>> >>> >> >>> >> >
>> >>> >> >>> >> >                         GF_ASSERT
>> >>> >> >>> >> > (lru_inode_ctx->block_num >
>> >>> >> >>> >> > 0);
>> >>> >> >>> >> >
>> >>> >> >>> >> > Hopefully can get some advice from you guys on this.
>> >>> >> >>> >> > Thanks.
>> >>> >> >>> >> >
>> >>> >> >>> >> > Cw
>> >>> >> >>> >> >
>> >>> >> >>> >> > On Tue, Dec 6, 2016 at 9:07 AM, qingwei wei
>> >>> >> >>> >> > <tchengwee at gmail.com>
>> >>> >> >>> >> > wrote:
>> >>> >> >>> >> >> Hi,
>> >>> >> >>> >> >>
>> >>> >> >>> >> >> This is the repost of my email in the gluster-user
>> >>> >> >>> >> >> mailing
>> >>> >> >>> >> >> list.
>> >>> >> >>> >> >> Appreciate if anyone has any idea on the issue i have
>> >>> >> >>> >> >> now.
>> >>> >> >>> >> >> Thanks.
>> >>> >> >>> >> >>
>> >>> >> >>> >> >> I encountered this when i do the FIO random write on the
>> >>> >> >>> >> >> fuse
>> >>> >> >>> >> >> mount
>> >>> >> >>> >> >> gluster volume. After this assertion happen, the client
>> >>> >> >>> >> >> log
>> >>> >> >>> >> >> is
>> >>> >> >>> >> >> filled
>> >>> >> >>> >> >> with pending frames messages and FIO just show zero IO in
>> >>> >> >>> >> >> the
>> >>> >> >>> >> >> progress
>> >>> >> >>> >> >> status. As i leave this test to run overnight, the client
>> >>> >> >>> >> >> log
>> >>> >> >>> >> >> file
>> >>> >> >>> >> >> fill up with those pending frame messages and hit 28GB
>> >>> >> >>> >> >> for
>> >>> >> >>> >> >> around 12
>> >>> >> >>> >> >> hours.
>> >>> >> >>> >> >>
>> >>> >> >>> >> >> The client log:
>> >>> >> >>> >> >>
>> >>> >> >>> >> >> [2016-12-04 15:48:35.274208] W [MSGID: 109072]
>> >>> >> >>> >> >> [dht-linkfile.c:50:dht_linkfile_lookup_cbk] 0-testSF-dht:
>> >>> >> >>> >> >> got
>> >>> >> >>> >> >> non-linkfile
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >> testSF-replicate-0:/.shard/21da7b64-45e5-4c6a-9244-53d0284bf7ed.7038,
>> >>> >> >>> >> >> gfid = 00000000-0000-0000-0000-000000000000
>> >>> >> >>> >> >> [2016-12-04 15:48:35.277208] W [MSGID: 109072]
>> >>> >> >>> >> >> [dht-linkfile.c:50:dht_linkfile_lookup_cbk] 0-testSF-dht:
>> >>> >> >>> >> >> got
>> >>> >> >>> >> >> non-linkfile
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >> testSF-replicate-0:/.shard/21da7b64-45e5-4c6a-9244-53d0284bf7ed.8957,
>> >>> >> >>> >> >> gfid = 00000000-0000-0000-0000-000000000000
>> >>> >> >>> >> >> [2016-12-04 15:48:35.277588] W [MSGID: 109072]
>> >>> >> >>> >> >> [dht-linkfile.c:50:dht_linkfile_lookup_cbk] 0-testSF-dht:
>> >>> >> >>> >> >> got
>> >>> >> >>> >> >> non-linkfile
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >> testSF-replicate-0:/.shard/21da7b64-45e5-4c6a-9244-53d0284bf7ed.11912,
>> >>> >> >>> >> >> gfid = 00000000-0000-0000-0000-000000000000
>> >>> >> >>> >> >> [2016-12-04 15:48:35.312751] E
>> >>> >> >>> >> >> [shard.c:460:__shard_update_shards_inode_list]
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >> (-->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(shard_common_lookup_shards_cbk+0x2d)
>> >>> >> >>> >> >> [0x7f86cc42efdd]
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >> -->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(shard_link_block_inode+0xdf)
>> >>> >> >>> >> >> [0x7f86cc42ef6f]
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >>
>> >>> >> >>> >> >> -->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(__shard_update_shards_inode_list+0x22e)
>> >>> >> >>> >> >> [0x7f86cc42a1ce] ) 0-: Assertion failed:
>> >>> >> >>> >> >> lru_inode_ctx->block_num >
>> >>> >> >>> >> >> 0
>> >>> >> >>> >> >> pending frames:
>> >>> >> >>> >> >> frame : type(0) op(0)
>> >>> >> >>> >> >> frame : type(0) op(0)
>> >>> >> >>> >> >> frame : type(0) op(0)
>> >>> >> >>> >> >> frame : type(0) op(0)
>> >>> >> >>> >> >> frame : type(0) op(0)
>> >>> >> >>> >> >> frame : type(0) op(0)
>> >>> >> >>> >> >>
>> >>> >> >>> >> >> Gluster info (i am testing this on one server with each
>> >>> >> >>> >> >> disk
>> >>> >> >>> >> >> representing one brick, this gluster volume is then
>> >>> >> >>> >> >> mounted
>> >>> >> >>> >> >> locally
>> >>> >> >>> >> >> via fuse)
>> >>> >> >>> >> >>
>> >>> >> >>> >> >> Volume Name: testSF
>> >>> >> >>> >> >> Type: Distributed-Replicate
>> >>> >> >>> >> >> Volume ID: 3f205363-5029-40d7-b1b5-216f9639b454
>> >>> >> >>> >> >> Status: Started
>> >>> >> >>> >> >> Number of Bricks: 2 x 3 = 6
>> >>> >> >>> >> >> Transport-type: tcp
>> >>> >> >>> >> >> Bricks:
>> >>> >> >>> >> >> Brick1: 192.168.123.4:/mnt/sdb_mssd/testSF
>> >>> >> >>> >> >> Brick2: 192.168.123.4:/mnt/sdc_mssd/testSF
>> >>> >> >>> >> >> Brick3: 192.168.123.4:/mnt/sdd_mssd/testSF
>> >>> >> >>> >> >> Brick4: 192.168.123.4:/mnt/sde_mssd/testSF
>> >>> >> >>> >> >> Brick5: 192.168.123.4:/mnt/sdf_mssd/testSF
>> >>> >> >>> >> >> Brick6: 192.168.123.4:/mnt/sdg_mssd/testSF
>> >>> >> >>> >> >> Options Reconfigured:
>> >>> >> >>> >> >> features.shard-block-size: 16MB
>> >>> >> >>> >> >> features.shard: on
>> >>> >> >>> >> >> performance.readdir-ahead: on
>> >>> >> >>> >> >>
>> >>> >> >>> >> >> Gluster version: 3.7.17
>> >>> >> >>> >> >>
>> >>> >> >>> >> >> The actual disk usage (Is about 91% full):
>> >>> >> >>> >> >>
>> >>> >> >>> >> >> /dev/sdb1                235G  202G   22G  91%
>> >>> >> >>> >> >> /mnt/sdb_mssd
>> >>> >> >>> >> >> /dev/sdc1                235G  202G   22G  91%
>> >>> >> >>> >> >> /mnt/sdc_mssd
>> >>> >> >>> >> >> /dev/sdd1                235G  202G   22G  91%
>> >>> >> >>> >> >> /mnt/sdd_mssd
>> >>> >> >>> >> >> /dev/sde1                235G  200G   23G  90%
>> >>> >> >>> >> >> /mnt/sde_mssd
>> >>> >> >>> >> >> /dev/sdf1                235G  200G   23G  90%
>> >>> >> >>> >> >> /mnt/sdf_mssd
>> >>> >> >>> >> >> /dev/sdg1                235G  200G   23G  90%
>> >>> >> >>> >> >> /mnt/sdg_mssd
>> >>> >> >>> >> >>
>> >>> >> >>> >> >> Anyone encounter this issue before?
>> >>> >> >>> >> >>
>> >>> >> >>> >> >> Cw
>> >>> >> >>> >> _______________________________________________
>> >>> >> >>> >> Gluster-devel mailing list
>> >>> >> >>> >> Gluster-devel at gluster.org
>> >>> >> >>> >> http://www.gluster.org/mailman/listinfo/gluster-devel
>> >>> >> >>> >
>> >>> >> >>> >
>> >>> >> >>
>> >>> >> >>
>> >>> >
>> >>> >
>> >>
>> >>
>> >
>
>