[Gluster-devel] spurious error in self-heald.t

Mon Dec 1 12:19:18 UTC 2014

Let me copy-paste the code from DHT for easy reference here: 

<code> 

linked_inode = inode_link (entry_loc.inode, loc->inode, entry->d_name, &entry->d_stat); 
inode = entry_loc.inode; 
entry_loc.inode = linked_inode; 
inode_unref (inode); 
</code> 

-Krutika 

----- Original Message -----

> From: "Krutika Dhananjay" <kdhananj at redhat.com>
> To: "Emmanuel Dreyfus" <manu at netbsd.org>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Monday, December 1, 2014 5:46:43 PM
> Subject: Re: [Gluster-devel] spurious error in self-heald.t

> Hi,

> So here is what seems to be happening:

> Self-heal daemon is implemented in a way that each shd on a given node
> contains one healer thread for every brick that is local to it.
> And since in our regression tests, all bricks are on the same node, the lone
> self-heal daemon contains 2 healer threads.

> It so happens that at one point both healer threads perform readdirp()
> simultaneously on a given directory (the one coming from syncop_readdirp()
> in afr_shd_full_sweep()).
> The respective client xlators get the responses. Both of them do an
> inode_find() to see if the shared inode table already contains an entry for
> this gfid (refer to unserialize_rsp_direntp() in client-helpers.c).
> Both of them do not find the corresponding inode in inode table, hence they
> call inode_new() and allocate new in-memory inodes (I1 and I2 respectively,
> say) which are at that point not filled completely (and as a result ia_gfid,
> ia_type etc are all-zeroes).

> Now both clients unwind the calls to their parent - AFR, where afr performs
> inode_link() from calls to gf_link_inodes_from_dirent().
> Healer thread-1 races ahead, successfully links its inode (I1) in the inode
> table and releases the mutexes.
> Now healer thread-2 enters __inode_link(), where it manages to get the
> in-memory inode object (I1) that was just linked by thread-1 through a call
> to inode_find() and the function returns I1 as the @link_inode.

> What AFR should have done after that call was to set entry->inode to point to
> link_inode. But because it does not perform this action, entry->inode
> continues to hold I2 which was not initialised.
> And the subsequent syncop_opendir() is called using entry->inode (whose gfid
> is all zeroes), causing the process to crash in client xlator where it
> performs this null check.

> I think a proper fix would be something similar to what DHT does for instance
> in gf_defrag_fix_layout() in the three C statements following the
> inode_link() in dht-rebalance.c
> Basically AFR must be setting entry->inode to point to the linked inode
> object.

> Hope that helped.

> -Krutika

> ----- Original Message -----

> > From: "Emmanuel Dreyfus" <manu at netbsd.org>
> 
> > To: "Krutika Dhananjay" <kdhananj at redhat.com>
> 
> > Cc: "Emmanuel Dreyfus" <manu at netbsd.org>, "Gluster Devel"
> > <gluster-devel at gluster.org>
> 
> > Sent: Monday, December 1, 2014 2:44:25 PM
> 
> > Subject: Re: [Gluster-devel] spurious error in self-heald.t
> 

> > On Mon, Dec 01, 2014 at 04:00:46AM -0500, Krutika Dhananjay wrote:
> 
> > > Was able to recreate it. Thanks for the report. Will look
> 
> > > into why this could possibly happen.
> 

> > I poste the symptom workaround:
> 
> > http://review.gluster.com/9216
> 

> > Would it be admissible as an interm measure? At least it
> 
> > spares a crash.
> 

> > --
> 
> > Emmanuel Dreyfus
> 
> > manu at netbsd.org
> 

> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20141201/7fdf14c8/attachment.html>