[Gluster-devel] [Feature request]: Regression to take more patches in single instance

Fri Aug 2 03:09:36 UTC 2013

On Wed, Jul 31, 2013 at 5:11 AM, Jeff Darcy <jdarcy at redhat.com> wrote:

> On 07/31/2013 07:35 AM, Amar Tumballi wrote:
>
>> I was trying to fire some regression builds on very minor patches today,
>> and
>>  noticed (always known, but faced pain of 'waiting' today) that we can
>> fire
>>  regression build on only one patch (or a patchset if its submitted with
>> dependency added while submitting). And each regression run takes approx
>> 30mins.
>>
>> With this model, we can at max take only ~45 patches in a day, which won't
>> scale up if we want to grow with more people participating in code
>> contribution. Would be great to have an option to submit regression run
>> with
>>  multiple patch numbers, (technically they should be applicable one top of
>> other in any order if not dependent), and it should work fine. That way,
>> we
>> can handle more review load in future.
>>
>
> Maybe my brain has been baked too much by the sun, but I thought I'd seen
> cases
> where a regression run on a patch with dependencies automatically validated
> everything in the stack.  Not so?  That still places a burden on patch
> submitters to make sure dependencies are specified (shouldn't be a problem
> since the current tendency is to *over*specify dependencies) and on the
> person
> starting the run to pick the top of the stack, but it does allow us to kill
> multiple birds with one stone.
>
> As for scaling, isn't the basic solution to add more worker machines?  That
> would multiply the daily throughput by the number of workers, and decrease
> latency for simultaneously submitted runs proportionally.
>
>
The flip side of having too many patches regression-tested in parallel is
that, since the regression test applies the patch in question on top of the
current git HEAD _at the time of test execution_, we lose out on testing
the "combined effect" of those multiple patches. This can result in master
branch being in broken state even though every patch is tested (in
isolation). And the breakage will be visible much later - when an unrelated
patch is tested after the patches get (successfully tested and) merged
independently. This has happened before too, even with the current "test
one  patch at a time" model. E.g:

1 - Patch A is tested [success]
2 - Patch B is tested [success]
3 - Patch A is merged
4 - Patch B is merged
<git HEAD is broken now>
5 - Patch C is tested [failure, because combined effect of A + B is tested
only now]

The serial nature of today's testing limits such delays to some extent, as
tested patches keep getting merged before regression test of new patches
start. Parallelizing tests too much could potentially increase this "danger
window".

On the other hand, to guarantee master is never broken, test + merge must
be a strictly serial operation (i.e do not even start new regression job
until the previous patch is tested and merged). That is even worse, for
sure.

In the end we probably need a combination of the two strategies

- Ability to test multiple patches at the same time (solves regression
throughput to some extent and increases "integrated testing" of patches for
their combined effect.

- Ability to run tests in parallel (of the patch sets) where testing patch
sets can be formed such that the two groups are really independent and
there is very less chance of their combined effect to result in a
regression (e.g one patch set for a bunch of patches in glusterd and
another patch set for a bunch of patches in data path).

Avati
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130801/0cd48954/attachment-0001.html>