[Gluster-devel] Assertion failed: lru_inode_ctx->block_num > 0

Tue Dec 6 01:07:30 UTC 2016

Hi,

This is the repost of my email in the gluster-user mailing list.
Appreciate if anyone has any idea on the issue i have now. Thanks.

I encountered this when i do the FIO random write on the fuse mount
gluster volume. After this assertion happen, the client log is filled
with pending frames messages and FIO just show zero IO in the progress
status. As i leave this test to run overnight, the client log file
fill up with those pending frame messages and hit 28GB for around 12
hours.

The client log:

[2016-12-04 15:48:35.274208] W [MSGID: 109072]
[dht-linkfile.c:50:dht_linkfile_lookup_cbk] 0-testSF-dht: got
non-linkfile testSF-replicate-0:/.shard/21da7b64-45e5-4c6a-9244-53d0284bf7ed.7038,
gfid = 00000000-0000-0000-0000-000000000000
[2016-12-04 15:48:35.277208] W [MSGID: 109072]
[dht-linkfile.c:50:dht_linkfile_lookup_cbk] 0-testSF-dht: got
non-linkfile testSF-replicate-0:/.shard/21da7b64-45e5-4c6a-9244-53d0284bf7ed.8957,
gfid = 00000000-0000-0000-0000-000000000000
[2016-12-04 15:48:35.277588] W [MSGID: 109072]
[dht-linkfile.c:50:dht_linkfile_lookup_cbk] 0-testSF-dht: got
non-linkfile testSF-replicate-0:/.shard/21da7b64-45e5-4c6a-9244-53d0284bf7ed.11912,
gfid = 00000000-0000-0000-0000-000000000000
[2016-12-04 15:48:35.312751] E
[shard.c:460:__shard_update_shards_inode_list]
(-->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(shard_common_lookup_shards_cbk+0x2d)
[0x7f86cc42efdd]
-->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(shard_link_block_inode+0xdf)
[0x7f86cc42ef6f]
-->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(__shard_update_shards_inode_list+0x22e)
[0x7f86cc42a1ce] ) 0-: Assertion failed: lru_inode_ctx->block_num > 0
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)

Gluster info (i am testing this on one server with each disk
representing one brick, this gluster volume is then mounted locally
via fuse)

Volume Name: testSF
Type: Distributed-Replicate
Volume ID: 3f205363-5029-40d7-b1b5-216f9639b454
Status: Started
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 192.168.123.4:/mnt/sdb_mssd/testSF
Brick2: 192.168.123.4:/mnt/sdc_mssd/testSF
Brick3: 192.168.123.4:/mnt/sdd_mssd/testSF
Brick4: 192.168.123.4:/mnt/sde_mssd/testSF
Brick5: 192.168.123.4:/mnt/sdf_mssd/testSF
Brick6: 192.168.123.4:/mnt/sdg_mssd/testSF
Options Reconfigured:
features.shard-block-size: 16MB
features.shard: on
performance.readdir-ahead: on

Gluster version: 3.7.17

The actual disk usage (Is about 91% full):

/dev/sdb1                235G  202G   22G  91% /mnt/sdb_mssd
/dev/sdc1                235G  202G   22G  91% /mnt/sdc_mssd
/dev/sdd1                235G  202G   22G  91% /mnt/sdd_mssd
/dev/sde1                235G  200G   23G  90% /mnt/sde_mssd
/dev/sdf1                235G  200G   23G  90% /mnt/sdf_mssd
/dev/sdg1                235G  200G   23G  90% /mnt/sdg_mssd

Anyone encounter this issue before?

Cw