[Bugs] [Bug 1334164] Worker dies with [Errno 5] Input/output error upon creation of entries at slave

bugzilla at redhat.com bugzilla at redhat.com
Mon May 9 05:01:44 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1334164



--- Comment #1 from Raghavendra G <rgowdapp at redhat.com> ---
Pasting the discussion we had on mail below:

> Me, Aravinda and Kotresh had a discussion on this.
>
> RCA: non-linked inodes being used in dht-selfheal codepath. Issue got
> uncovered in geo-rep setup because of http://review.gluster.org/12539, as we
> now heal in discover codepath too.
>
> Proposed solution: Link inode in dht-selfheal and use the linked inode for
> all selfheal functionality. One question I ran into with this solution is
> about lookup not being complete for xlators between dht and xlator managing
> itable (fuse-bridge here). So, if there is a non-lookup fop it might use an
> inode which might be invalid for these xlators. However, there are two
> aspects to this question, when dht is trying to link inode object - inode1 -
> before it attempts healing.
>
> 1. A valid inode is already linked with gfid in itable by the time dht does
> inode_link. In this case dht gets this inode (instead of the one it got
> during lookup) and somebody else has completed the lookup already, the above
> said problem wont be there.
> 2. There is no valid inode linked with gfid in itable. In this case, it might
> so happen that a racing lookup (with a different inode object - say inode2)
> might complete and post _that_ there can be non-lookup fops on inode1 on
> which lookup is still in progress. So, the inode1 is still invalid for
> xlators b/w dht and fuse-bridge. We can solve this problem easily if dht
> always unwinds lookup call with "linked_inode" instead of the inode object
> passed down as part of request. So, the racing lookup here which got inode2
> as argument in request path will unwind with inode1. This means dht should
> do inode_find for the associated gfid if healing is not required and unwind
> lookup with result of inode_find if it finds a valid inode object. Note that
> this algo works if all translators use inode passed in the lookup_cbk,
> instead of using the one stored in their locals. This can be enforced for
> atleast xlators b/w dht and fuse-bridge.

This is the "ugly" aspect of this design - introducing restrictions on how
other xlators should behave. So, I am suggesting a variant of the "clean"
solution proposed below.

>
> To summarize the algorithm:
> 1. If dht decides a directory needs self-heal,
>     a. it links the inode.
>     b. use the return value of inode_link for healing.
> 2. When no healing is required, dht does an inode_find (gfid, itable) and
> unwind with the result (instead of inode passed to it as argument).
>
> I am yet to think through whether this algo has any caveats with NFS. This
> bug has a fuse-mount.
>
> There is one "clean" solution of separating lookup in to two fops - one does
> plain lookup and the other does healing if required. But, that's a huge
> change and going to take time.

A variant of this solution can be implemented without requiring too much of
code-change.

1. Introduce methods in itable that can be used by xlators to guide
inode/dentry management. In this case, lets have a method - say
itable_set_need_lookup. These methods _should_ be implemented by xlators doing
resolution/itable-management (like fuse-bridge, gfapi, protocol/server, nfs
etc). Note that the xlator doing itable management and the xlator associated
with itable (table->xl) need not be same. So, this helps for other xlators like
dht to have a say in how inodes are managed.

2. dht, when it decides that a heal has to be done
    a. links the inode into itable
    b. instructs the resolver code that a "resolution" (essentially a lookup in
this case, as we have not broken lookup yet) has to be done before resuming any
non-lookup fops on this inode.

3. The xlator managing inodes (fuse-bridge in this specific case) sends a
lookup on the inode if need lookup is set. This behavior is very much similar
to fuse_readdirp_cbk setting need-lookup and related behavior.

There are some implementation details - like dht treating revalidates
differently from fresh lookup etc. But, these can be worked out while coding.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list