<div dir="ltr"><div>Hi,</div><div><br></div><div>  I think I know the reason why tarball size is bigger, could it happen if tar file has more than one core.</div><div>  I triggered a build(<a href="https://review.gluster.org/19574">https://review.gluster.org/19574</a> to validate all test cases after enable brick mux) after update &quot;exit_one_failure=&quot;no&quot;&quot; in run-tests.sh </div><div>  so build has executed all test cases and in the earlier version of the patch, i was getting multiple cores.</div><div><br></div><div>  Now it is generating only one core, it seems other code paths are fixed so the issue should be resolved now.</div><div><br></div><div><br></div><div>Regards</div><div>Mohit Agrawal</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Feb 19, 2018 at 6:07 PM, Sankarshan Mukhopadhyay <span dir="ltr">&lt;<a href="mailto:sankarshan.mukhopadhyay@gmail.com" target="_blank">sankarshan.mukhopadhyay@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Mon, Feb 19, 2018 at 5:58 PM, Nithya Balachandran<br>

&lt;<a href="mailto:nbalacha@redhat.com">nbalacha@redhat.com</a>&gt; wrote:<br>

&gt;<br>

&gt;<br>

&gt; On 19 February 2018 at 13:12, Atin Mukherjee &lt;<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a>&gt; wrote:<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; On Mon, Feb 19, 2018 at 8:53 AM, Nigel Babu &lt;<a href="mailto:nigelb@redhat.com">nigelb@redhat.com</a>&gt; wrote:<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Hello,<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; As you all most likely know, we store the tarball of the binaries and<br>

&gt;&gt;&gt; core if there&#39;s a core during regression. Occasionally, we&#39;ve introduced a<br>

&gt;&gt;&gt; bug in Gluster and this tar can take up a lot of space. This has happened<br>

&gt;&gt;&gt; recently with brick multiplex tests. The build-install tar takes up 25G,<br>

&gt;&gt;&gt; causing the machine to run out of space and continuously fail.<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; AFAIK, we don&#39;t have a .t file in upstream regression suits where hundreds<br>

&gt;&gt; of volumes are created. With that scale and brick multiplexing enabled, I<br>

&gt;&gt; can understand the core will be quite heavy loaded and may consume up to<br>

&gt;&gt; this much of crazy amount of space. FWIW, can we first try to figure out<br>

&gt;&gt; which test was causing this crash and see if running a gcore after a certain<br>

&gt;&gt; steps in the tests do left us with a similar size of the core file? IOW,<br>

&gt;&gt; have we actually seen such huge size of core file generated earlier? If not,<br>

&gt;&gt; what changed because which we&#39;ve started seeing this is something to be<br>

&gt;&gt; invested on.<br>

&gt;<br>

&gt;<br>

&gt; We also need to check if this is only the core file that is causing the<br>

&gt; increase in size or whether there is something else that is taking up a lot<br>

&gt; of space.<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; I&#39;ve made some changes this morning. Right after we create the tarball,<br>

&gt;&gt;&gt; we&#39;ll delete all files in /archive that are greater than 1G. Please be aware<br>

&gt;&gt;&gt; that this means all large files including the newly created tarball will be<br>

&gt;&gt;&gt; deleted. You will have to work with the traceback on the Jenkins job.<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; We&#39;d really need to first investigate on the average size of the core file<br>

&gt;&gt; what we can get with when a system is running with brick multiplexing and<br>

&gt;&gt; ongoing I/O. With out that immediately deleting the core files &gt; 1G will<br>

&gt;&gt; cause trouble to the developers in debugging genuine crashes as traceback<br>

&gt;&gt; alone may not be sufficient.<br>

&gt;&gt;<br>

<br>

</div></div>I&#39;d like to echo what Nithya writes - instead of treating this<br>

incident as an outlier, we might want to do further analysis. If this<br>

has happened on a production system - there would be blood.<br>

</blockquote></div><br></div>