[Bugs] [Bug 1736341] New: potential deadlock while processing callbacks in gfapi

bugzilla at redhat.com bugzilla at redhat.com
Thu Aug 1 17:00:51 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1736341

            Bug ID: 1736341
           Summary: potential deadlock while processing callbacks in gfapi
           Product: GlusterFS
           Version: 6
          Hardware: All
                OS: All
            Status: NEW
         Component: libgfapi
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: skoduri at redhat.com
        QA Contact: bugs at gluster.org
                CC: atumball at redhat.com, bugs at gluster.org, pasik at iki.fi
        Depends On: 1733166
            Blocks: 1733520
  Target Milestone: ---
    Classification: Community



+++ This bug was initially created as a clone of Bug #1733166 +++

Description of problem:

While running parallel I/Os involving many files on nfs-ganesha mount, have hit
below deadlock in the nfs-ganesha process.


epoll thread:
....glfs_cbk_upcall_data->upcall_syncop_args_init->glfs_h_poll_cache_invalidation->glfs_h_find_handle->priv_glfs_active_subvol->glfs_lock
(waiting on lock)

I/O thread:

...glfs_h_stat->glfs_resolve_inode->__glfs_resolve_inode (at this point we
acquired glfs_lock) -> ...->glfs_refresh_inode_safe->syncop_lookup

To summarize-
I/O thread which acquired glfs_lock are waiting for epoll threads to receive
response where as epoll threads are waiting for I/O threads to release lock. 

Similar issue was identified earlier (bug1693575).

There could be other issues at different layers depending on how client xlators
choose to process these callbacks.

The correct way of avoiding or fixing these issues is to re-design upcall model
which is to use different sockets for callback communication  instead of using
same epoll threads. Raised github issue for that -
https://github.com/gluster/glusterfs/issues/697 

Since it may take a while, raising this BZ to provide a workaround fix in gfapi
layer for now 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Worker Ant on 2019-07-25 10:09:58 UTC ---

REVIEW: https://review.gluster.org/23107 (gfapi: Fix deadlock while processing
upcall) posted (#1) for review on release-6 by soumya k

--- Additional comment from Worker Ant on 2019-07-25 10:16:57 UTC ---

REVIEW: https://review.gluster.org/23108 (gfapi: Fix deadlock while processing
upcall) posted (#1) for review on master by soumya k


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1733166
[Bug 1733166] potential deadlock while processing callbacks in gfapi
https://bugzilla.redhat.com/show_bug.cgi?id=1733520
[Bug 1733520] potential deadlock while processing callbacks in gfapi
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list