<div dir="ltr">Summing up various discussions I had on this,<div><br></div><div style="">1. Current ping frame work should measure just the responsiveness of network and rpc layer. This means poller threads shouldn&#39;t be winding the individual fops at all (as it might add delay in reading the ping requests). Instead, they can queue the requests to a common work queue and other threads should pick up the requests.</div><div style="">2. We also need another tool to measure the responsiveness of the entire Brick xlator stack. This tool can have a slightly larger time than ping timeout as responses naturally will be delayed. Whether this tool should measure the responsiveness of the backend fs is an open question as we already have a posix health checker that measures the responsiveness and sends a CHILD_DOWN when backend fs in not responsive. Also, there are open questions here like what data structures various xlators are accessing as part of this fop (like inode, fd, mem-pools etc). Accessing various data structures will result in a different latency.</div><div style="">3. Currently ping packets are not sent by a client when there is no I/O from it. As per the discussions above, client should measure the responsiveness even when there is no traffic to/from it. May be the interval during which ping packets are sent can be increased.<br></div><div style="">4. We&#39;ve fixed some lock contention issues on the brick stack due to high latency on backend fs. However, this is on-going work as contentions can be found in various codepaths (mem-pool etc).</div><div style=""><br></div><div style="">We&#39;ll shortly send a fix for 1. The other things will be picked based on the bandwidth. Contributions are welcome :).</div><div style=""><br></div><div style="">regards,</div><div style="">Raghavendra.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jan 25, 2017 at 11:01 AM, Joe Julian <span dir="ltr">&lt;<a href="mailto:joe@julianfamily.org" target="_blank">joe@julianfamily.org</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>Yes, the earlier a fault is detected the better.<br><br><div class="gmail_quote"><span class="">On January 24, 2017 9:21:27 PM PST, Jeff Darcy &lt;<a href="mailto:jdarcy@redhat.com" target="_blank">jdarcy@redhat.com</a>&gt; wrote:</span><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<pre class="m_3246746483282522367k9mail"><span class=""><blockquote class="gmail_quote" style="margin:0pt 0pt 1ex 0.8ex;border-left:1px solid #729fcf;padding-left:1ex"> If there are no responses to be received and no requests being<br> sent to a brick, why would be a client be interested in the health of<br> server/brick?<br></blockquote><br>The client (code) might not, but the user might want to find out and fix<br>the fault before the brick gets busy again.<br><hr><br></span><span class="">Gluster-devel mailing list<br><a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a><br><a href="http://lists.gluster.org/mailman/listinfo/gluster-devel" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-devel</a><br></span></pre></blockquote></div><span class="HOEnZb"><font color="#888888"><br>

-- <br>

Sent from my Android device with K-9 Mail. Please excuse my brevity.</font></span></div><br>______________________________<wbr>_________________<br>

Gluster-devel mailing list<br>

<a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br>

<a href="http://lists.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-devel</a><br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">Raghavendra G<br></div>

</div>