<div dir="ltr"><div>Could you do the following on one of the nodes where you are observing high CPU usage and attach that file to this thread? We can find what threads/processes are leading to high usage. Do this for say 10 minutes when you see the ~100% CPU.<br></div><div><br></div><div><span style="font-family:&quot;courier new&quot;,courier,monospace">top -bHd 5 &gt; /tmp/top.${HOSTNAME}.txt</span></div><div><div><br><div class="gmail_quote"><div dir="ltr">On Wed, Aug 15, 2018 at 2:37 PM Hu Bert &lt;<a href="mailto:revirii@googlemail.com">revirii@googlemail.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello again :-)<br>

<br>

The self heal must have finished as there are no log entries in<br>

glustershd.log files anymore. According to munin disk latency (average<br>

io wait) has gone down to 100 ms, and disk utilization has gone down<br>

to ~60% - both on all servers and hard disks.<br>

<br>

But now system load on 2 servers (which were in the good state)<br>

fluctuates between 60 and 100; the server with the formerly failed<br>

disk has a load of 20-30.I&#39;ve uploaded some munin graphics of the cpu<br>

usage:<br>

<br>

<a href="https://abload.de/img/gluster11_cpu31d3a.png" rel="noreferrer" target="_blank">https://abload.de/img/gluster11_cpu31d3a.png</a><br>

<a href="https://abload.de/img/gluster12_cpu8sem7.png" rel="noreferrer" target="_blank">https://abload.de/img/gluster12_cpu8sem7.png</a><br>

<a href="https://abload.de/img/gluster13_cpud7eni.png" rel="noreferrer" target="_blank">https://abload.de/img/gluster13_cpud7eni.png</a><br>

<br>

This can&#39;t be normal. 2 of the servers under heavy load and one not<br>

that much. Does anyone have an explanation of this strange behaviour?<br>

<br>

<br>

Thx :-)<br>

<br>

2018-08-14 9:37 GMT+02:00 Hu Bert &lt;<a href="mailto:revirii@googlemail.com" target="_blank">revirii@googlemail.com</a>&gt;:<br>

&gt; Hi there,<br>

&gt;<br>

&gt; well, it seems the heal has finally finished. Couldn&#39;t see/find any<br>

&gt; related log message; is there such a message in a specific log file?<br>

&gt;<br>

&gt; But i see the same behaviour when the last heal finished: all CPU<br>

&gt; cores are consumed by brick processes; not only by the formerly failed<br>

&gt; bricksdd1, but by all 4 brick processes (and their threads). Load goes<br>

&gt; up to &gt; 100 on the 2 servers with the not-failed brick, and<br>

&gt; glustershd.log gets filled with a lot of entries. Load on the server<br>

&gt; with the then failed brick not that high, but still ~60.<br>

&gt;<br>

&gt; Is this behaviour normal? Is there some post-heal after a heal has finished?<br>

&gt;<br>

&gt; thx in advance :-)<br>

</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">Pranith<br></div></div></div></div></div>