<p dir="ltr">Hi Thorgeir,</p>

<p dir="ltr">Did you try adding an arbiter with SSD brick/bricks ?</p>

<p dir="ltr">SSD/NVMe is the best type of storage for an arbiter - yes , it's more expensive but you will need less disks than a data brick .</p>

<p dir="ltr">Of course , arbiter is only one side of the equasion and the time to heal might depend on your data bricks' IOPS.</p>

<p dir="ltr">How much time does a node in the cluster need to heal after being reboot ?</p>

<p dir="ltr">Best Regards,<br>

Strahil Nikolov</p>

<div class="quote">On Oct 16, 2019 16:37, Thorgeir Marthinussen &lt;thorgeir.marthinussen@basefarm.com&gt; wrote:<br type='attribution'><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div style="text-align:left;direction:ltr">


<div>Hi,</div>


<div><br />


</div>


<div>We have an old Gluster cluster setup, running a replica 2 across two datacenters, and currently on version 4.1.5</div>


<div><br />


</div>


<div>I need to add an arbiter to this setup, but I&#39;m concerned about the performance impact of this on the volumes.</div>


<div><br />


</div>


<div>I recently set up a new cluster, for a different purpose, and decided to test adding an arbiter to the volume after adding in some data.</div>


<div>Had a volume with ~435,000 files totaling about 12TB.</div>


<div>Adding the arbiter initiated a heal-operation that took almost 3 hours.</div>


<div><br />


</div>


<div>The older cluster, one of the volumes is about 14TB, but ~45,5 million files.</div>


<div><br />


</div>


<div>Since arbiter is only concerned about metadata and checksums, I&#39;m concerned about the fact that we have 100 times the amount of files, i.e. 100 times the amount of I/O operations to execute during healing, and possibly 100 times the time which would mean


 about 12,5 days.</div>


<div><br />


</div>


<div>Another &#34;issue&#34; is that the &#39;gluster volume heal &lt;vol-name&gt; info summary&#39; command seems to &#34;count&#34; all the files, so the command can take a very long time to complete.</div>


<div>The metrics-scraping script I created for us, with a timeout of 110seconds, fails to complete when a volume has over ~800-900 files unsynced (which happens regularily when taking one cluster-node down for patching).</div>


<div><br />


</div>


<div><br />


</div>


<div>Does anyone have any experience with adding arbiter afterwards, performance impact, time to heal, etc.</div>


<div>Also other ways to get the status on healing.</div>


<div><br />


</div>


<div>Any advice would be appreciated.</div>


<div><br />


<br />


Best regards<br />


-- <br />


<b>THORGEIR MARTHINUSSEN</b><br />


<div>Senior Systems Consultant</div>


<b>BASEFARM</b></div>


</div>


</blockquote></div>