[Bugs] [Bug 1743573] fuse client hung when issued a lookup "ls" on an ec volume

bugzilla at redhat.com bugzilla at redhat.com
Tue Aug 20 08:58:46 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1743573



--- Comment #1 from Pranith Kumar K <pkarampu at redhat.com> ---
(gdb) p $4->locks[0]
$5 = {lock = 0x7f3da4abc1d8, fop = 0x7f3d74317e18, owner_list = {next =
0x7f3d74317ed0, prev = 0x7f3d74317ed0}, wait_list = {next = 0x7f3da4abc208,
prev = 0x7f3da4abc208}, update = {false, false}, dirty = { false, false},
optimistic_changelog = false, base = 0x0, size = 0, waiting_flags = 0, fl_start
= 0, fl_end = 9223372036854775807}
(gdb) p $4->locks[0].lock
$6 = (ec_lock_t *) 0x7f3da4abc1d8
(gdb) p *$4->locks[0].lock
$7 = {ctx = 0x7f3db7cbff70, timer = 0x0, owners = {next = 0x7f3da4abc1e8, prev
= 0x7f3da4abc1e8}, waiting = {next = 0x7f3da4abc1f8, prev = 0x7f3da4abc1f8},
frozen = {next = 0x7f3d74317ee0, prev = 0x7f3d74317ee0}, mask = 0, good_mask =
18446744073709551615, healing = 0, refs_owners = 0, refs_pending = 0,
waiting_flags = 0, acquired = false, unlock_now = false, release = true, query
= true, fd = 0x0, loc = {path = 0x7f3d75084a40
"/IOs/kernel/rhs-client45.lab.eng.blr.redhat.com/dir.2/linux-5.2.7/Documentation/devicetree/bindings/rtc",
name = 0x7f3d75084aa4 "rtc", inode = 0x7f3d98014768, parent = 0x7f3d99faad38,
gfid = "\310\a\376|-\205K\v\215\000\b\363>\241\021i", pargfid =
"\345\330}\212\242{Nr\233\064\373\030MD\361", <incomplete sequence \360>},
{type = ENTRYLK_WRLCK, flock = { l_type = 1, l_whence = 0, l_start = 0, l_len =
0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}}}
(gdb) p &$4->locks[0].lock->owners
$8 = (struct list_head *) 0x7f3da4abc1e8
(gdb) p &$4->locks[0].lock->waiting
$9 = (struct list_head *) 0x7f3da4abc1f8
(gdb) p &$4->locks[0].lock->frozen
$10 = (struct list_head *) 0x7f3da4abc208

This seems to suggest that the fop is stuck in frozen list which can only
happen if lock->release is set to true.



    Problem:
    Mount-1                                Mount-2
    1)Tries to acquire lock on 'dir1'   1)Tries to acquire lock on 'dir1'
    2)Lock is granted on brick-0        2)Lock gets EAGAIN on brick-0 and
                                          leads to blocking lock on brick-0
    3)Gets a lock-contention            3) Doesn't matter what happens on
mount-2
      notification, marks lock->release    from here on.
      to true.
    4)New fop comes on 'dir1' which will
      be put in frozen list as lock->release
      is set to true.
    5) Lock acquisition from step-2 fails because
    3 bricks went down in 4+2 setup.

    Fop on mount-1 which is put in frozen list will hang because no codepath
will
    move it from frozen list to any other list and the lock will not be
retried.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list