<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Feb 1, 2018 at 2:48 PM, Shyam Ranganathan <span dir="ltr">&lt;<a href="mailto:srangana@redhat.com" target="_blank">srangana@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">On 02/01/2018 08:25 AM, Xavi Hernandez wrote:<br>

&gt; After having tried several things, it seems that it will be complex to<br>

&gt; solve these races. All attempts to fix them have caused failures in<br>

&gt; other connections. Since I&#39;ve other work to do and it doesn&#39;t seem to be<br>

&gt; causing serious failures in production, for now I&#39;ll leave this. I&#39;ll<br>

&gt; retake this when I&#39;ve more time.<br>

<br>

</span>Xavi, convert the findings into a bug, and post the details there, so<br>

that it may be followed up? (if not already done)<br></blockquote><div><br></div><div>I&#39;ve just created this bug: <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1541032">https://bugzilla.redhat.com/show_bug.cgi?id=1541032</a></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-im gmail-HOEnZb"><br>

&gt;<br>

&gt; Xavi<br>

&gt;<br>

&gt; On Mon, Jan 29, 2018 at 11:07 PM, Xavi Hernandez &lt;<a href="mailto:jahernan@redhat.com">jahernan@redhat.com</a><br>

</span><div class="gmail-HOEnZb"><div class="gmail-h5">&gt; &lt;mailto:<a href="mailto:jahernan@redhat.com">jahernan@redhat.com</a>&gt;&gt; wrote:<br>

&gt;<br>

&gt;     Hi all,<br>

&gt;<br>

&gt;     I&#39;ve identified a race in RPC layer that caused some spurious<br>

&gt;     disconnections and CHILD_DOWN notifications.<br>

&gt;<br>

&gt;     The problem happens when protocol/client reconfigures a connection<br>

&gt;     to move from glusterd to glusterfsd. This is done by calling<br>

&gt;     rpc_clnt_reconfig() followed by rpc_transport_disconnect().<br>

&gt;<br>

&gt;     This seems fine because client_rpc_notify() will call<br>

&gt;     rpc_clnt_cleanup_and_start() when the disconnect notification is<br>

&gt;     received. However There&#39;s a problem.<br>

&gt;<br>

&gt;     Suppose that the disconnection notification has been executed and we<br>

&gt;     are just about to call rpc_clnt_cleanup_and_start(). If at this<br>

&gt;     point the reconnection timer is fired, rpc_clnt_reconnect() will be<br>

&gt;     processed. This will cause the socket to be reconnected and a<br>

&gt;     connection notification will be processed. Then a handshake request<br>

&gt;     will be sent to the server.<br>

&gt;<br>

&gt;     However, when rpc_clnt_cleanup_and_start() continues, all sent XID&#39;s<br>

&gt;     are deleted. When we receive the answer from the handshake, we are<br>

&gt;     unable to map the XID, making the request to fail. So the handshake<br>

&gt;     fails and the client is considered down, sending a CHILD_DOWN<br>

&gt;     notification to upper xlators.<br>

&gt;<br>

&gt;     This causes, in some tests, to start processing things while a brick<br>

&gt;     is down unexpectedly, causing spurious failures on the test.<br>

&gt;<br>

&gt;     To solve the problem I&#39;ve forced the rpc_clnt_reconfig() function to<br>

&gt;     disable the RPC connection using similar code to rcp_clnt_disable().<br>

&gt;     This prevents the background rpc_clnt_reconnect() timer to be<br>

&gt;     executed, avoiding the problem.<br>

&gt;<br>

&gt;     This seems to work fine for many tests, but it seems to be causing<br>

&gt;     some issue in gfapi based tests. I&#39;m still investigating this.<br>

&gt;<br>

&gt;     Xavi<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

</div></div><div class="gmail-HOEnZb"><div class="gmail-h5">&gt; ______________________________<wbr>_________________<br>

&gt; Gluster-devel mailing list<br>

&gt; <a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br>

&gt; <a href="http://lists.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-devel</a><br>

&gt;<br>

</div></div></blockquote></div><br></div></div>