<div dir="ltr"><div>Hi,</div><div><br></div><div> I think I know the reason why tarball size is bigger, could it happen if tar file has more than one core.</div><div> I triggered a build(<a href="https://review.gluster.org/19574">https://review.gluster.org/19574</a> to validate all test cases after enable brick mux) after update "exit_one_failure="no"" in run-tests.sh </div><div> so build has executed all test cases and in the earlier version of the patch, i was getting multiple cores.</div><div><br></div><div> Now it is generating only one core, it seems other code paths are fixed so the issue should be resolved now.</div><div><br></div><div><br></div><div>Regards</div><div>Mohit Agrawal</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Feb 19, 2018 at 6:07 PM, Sankarshan Mukhopadhyay <span dir="ltr"><<a href="mailto:sankarshan.mukhopadhyay@gmail.com" target="_blank">sankarshan.mukhopadhyay@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Mon, Feb 19, 2018 at 5:58 PM, Nithya Balachandran<br>
<<a href="mailto:nbalacha@redhat.com">nbalacha@redhat.com</a>> wrote:<br>
><br>
><br>
> On 19 February 2018 at 13:12, Atin Mukherjee <<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a>> wrote:<br>
>><br>
>><br>
>><br>
>> On Mon, Feb 19, 2018 at 8:53 AM, Nigel Babu <<a href="mailto:nigelb@redhat.com">nigelb@redhat.com</a>> wrote:<br>
>>><br>
>>> Hello,<br>
>>><br>
>>> As you all most likely know, we store the tarball of the binaries and<br>
>>> core if there's a core during regression. Occasionally, we've introduced a<br>
>>> bug in Gluster and this tar can take up a lot of space. This has happened<br>
>>> recently with brick multiplex tests. The build-install tar takes up 25G,<br>
>>> causing the machine to run out of space and continuously fail.<br>
>><br>
>><br>
>> AFAIK, we don't have a .t file in upstream regression suits where hundreds<br>
>> of volumes are created. With that scale and brick multiplexing enabled, I<br>
>> can understand the core will be quite heavy loaded and may consume up to<br>
>> this much of crazy amount of space. FWIW, can we first try to figure out<br>
>> which test was causing this crash and see if running a gcore after a certain<br>
>> steps in the tests do left us with a similar size of the core file? IOW,<br>
>> have we actually seen such huge size of core file generated earlier? If not,<br>
>> what changed because which we've started seeing this is something to be<br>
>> invested on.<br>
><br>
><br>
> We also need to check if this is only the core file that is causing the<br>
> increase in size or whether there is something else that is taking up a lot<br>
> of space.<br>
>><br>
>><br>
>>><br>
>>><br>
>>> I've made some changes this morning. Right after we create the tarball,<br>
>>> we'll delete all files in /archive that are greater than 1G. Please be aware<br>
>>> that this means all large files including the newly created tarball will be<br>
>>> deleted. You will have to work with the traceback on the Jenkins job.<br>
>><br>
>><br>
>> We'd really need to first investigate on the average size of the core file<br>
>> what we can get with when a system is running with brick multiplexing and<br>
>> ongoing I/O. With out that immediately deleting the core files > 1G will<br>
>> cause trouble to the developers in debugging genuine crashes as traceback<br>
>> alone may not be sufficient.<br>
>><br>
<br>
</div></div>I'd like to echo what Nithya writes - instead of treating this<br>
incident as an outlier, we might want to do further analysis. If this<br>
has happened on a production system - there would be blood.<br>
</blockquote></div><br></div>