[Gluster-infra] [Gluster-devel] Jenkins Issues this weekend and how we're solving them

Mon Feb 19 13:28:03 UTC 2018

On 19 February 2018 at 18:19, Mohit Agrawal <moagrawa at redhat.com> wrote:

> Hi,
>
>   I think I know the reason why tarball size is bigger, could it happen if
> tar file has more than one core.
>   I triggered a build(https://review.gluster.org/19574 to validate all
> test cases after enable brick mux) after update "exit_one_failure="no"" in
> run-tests.sh
>   so build has executed all test cases and in the earlier version of the
> patch, i was getting multiple cores.
>
>   Now it is generating only one core, it seems other code paths are fixed
> so the issue should be resolved now.
>
>
> Good to hear. Thanks Mohit.

> Regards
> Mohit Agrawal
>
> On Mon, Feb 19, 2018 at 6:07 PM, Sankarshan Mukhopadhyay <
> sankarshan.mukhopadhyay at gmail.com> wrote:
>
>> On Mon, Feb 19, 2018 at 5:58 PM, Nithya Balachandran
>> <nbalacha at redhat.com> wrote:
>> >
>> >
>> > On 19 February 2018 at 13:12, Atin Mukherjee <amukherj at redhat.com>
>> wrote:
>> >>
>> >>
>> >>
>> >> On Mon, Feb 19, 2018 at 8:53 AM, Nigel Babu <nigelb at redhat.com> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> As you all most likely know, we store the tarball of the binaries and
>> >>> core if there's a core during regression. Occasionally, we've
>> introduced a
>> >>> bug in Gluster and this tar can take up a lot of space. This has
>> happened
>> >>> recently with brick multiplex tests. The build-install tar takes up
>> 25G,
>> >>> causing the machine to run out of space and continuously fail.
>> >>
>> >>
>> >> AFAIK, we don't have a .t file in upstream regression suits where
>> hundreds
>> >> of volumes are created. With that scale and brick multiplexing
>> enabled, I
>> >> can understand the core will be quite heavy loaded and may consume up
>> to
>> >> this much of crazy amount of space. FWIW, can we first try to figure
>> out
>> >> which test was causing this crash and see if running a gcore after a
>> certain
>> >> steps in the tests do left us with a similar size of the core file?
>> IOW,
>> >> have we actually seen such huge size of core file generated earlier?
>> If not,
>> >> what changed because which we've started seeing this is something to be
>> >> invested on.
>> >
>> >
>> > We also need to check if this is only the core file that is causing the
>> > increase in size or whether there is something else that is taking up a
>> lot
>> > of space.
>> >>
>> >>
>> >>>
>> >>>
>> >>> I've made some changes this morning. Right after we create the
>> tarball,
>> >>> we'll delete all files in /archive that are greater than 1G. Please
>> be aware
>> >>> that this means all large files including the newly created tarball
>> will be
>> >>> deleted. You will have to work with the traceback on the Jenkins job.
>> >>
>> >>
>> >> We'd really need to first investigate on the average size of the core
>> file
>> >> what we can get with when a system is running with brick multiplexing
>> and
>> >> ongoing I/O. With out that immediately deleting the core files > 1G
>> will
>> >> cause trouble to the developers in debugging genuine crashes as
>> traceback
>> >> alone may not be sufficient.
>> >>
>>
>> I'd like to echo what Nithya writes - instead of treating this
>> incident as an outlier, we might want to do further analysis. If this
>> has happened on a production system - there would be blood.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-infra/attachments/20180219/e265b945/attachment.html>