[Bugs] [Bug 1707195] VM stuck in a shutdown because of a pending fuse request

bugzilla at redhat.com bugzilla at redhat.com
Tue May 7 02:48:45 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1707195

Raghavendra G <rgowdapp at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED



--- Comment #2 from Raghavendra G <rgowdapp at redhat.com> ---
I do see a write request hung in write-behind. Details of write-request from
state-dump:

[xlator.performance.write-behind.wb_inode]
path=/e5dd645f-88bb-491c-9145-38fa229cbc4d/images/8e84c1ed-48ba-4b82-9882-c96e6f260bab/29bba0a1-6c7b-4358-9ef2-f8080405778d
inode=0x7f6e40060888
gfid=6348d15d-7b17-4993-9da9-3f588c2ad5a8
window_conf=1048576
window_current=0
transit-size=0
dontsync=0

[.WRITE]
unique=5518502
refcount=1
wound=no
generation-number=0
req->op_ret=131072
req->op_errno=0
sync-attempts=0
sync-in-progress=no
size=131072
offset=4184756224
lied=0
append=0
fulfilled=0
go=0

I'll go through this and will try to come up with an RCA.

--- Additional comment from Raghavendra G on 2019-04-29 07:21:50 UTC ---

There is a race in the way O_DIRECT writes are handled. Assume two overlapping
write requests w1 and w2.

* w1 is issued and is in wb_inode->wip queue as the response is still pending
from bricks. Also wb_request_unref in wb_do_winds is not yet invoked.

       list_for_each_entry_safe (req, tmp, tasks, winds) {
                list_del_init (&req->winds);

                if (req->op_ret == -1) {
                        call_unwind_error_keep_stub (req->stub, req->op_ret,
                                                     req->op_errno);
                } else {
                        call_resume_keep_stub (req->stub);
                }

                wb_request_unref (req);
        }

* w2 is issued and wb_process_queue is invoked. w2 is not picked up for winding
as w1 is still in wb_inode->wip. w1 is added to todo list and wb_writev for w2
returns.

* response to w1 is received and invokes wb_request_unref. Assume
wb_request_unref in wb_do_winds (see point 1) is not invoked yet. Since there
is one more refcount, wb_request_unref in wb_writev_cbk of w1 doesn't remove w1
from wip.
* wb_process_queue is invoked as part of wb_writev_cbk of w1. But, it fails to
wind w2 as w1 is still in wip.
* wb_requet_unref is invoked on w1 as part of wb_do_winds. w1 is removed from
all queues including w1.
* After this point there is no invocation of wb_process_queue unless new
request is issued from application causing w2 to be hung till the next request.

This bug is similar to bz 1626780 and bz 1379655. Though the issue is similar,
fixes to these to bzs won't fix the current bug and hence this bug is not a
duplicate. This bug will require a new fix and I'll post a patch to gerrit
shortly.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list