[Gluster-infra] New spurious regression

Thu Nov 5 11:37:54 UTC 2015

Le jeudi 05 novembre 2015 à 15:59 +0530, Avra Sengupta a écrit :
> On 11/05/2015 03:57 PM, Avra Sengupta wrote:
> > On 11/05/2015 03:56 PM, Vijay Bellur wrote:
> >> On Thursday 05 November 2015 12:19 PM, Avra Sengupta wrote:
> >>> Hi,
> >>>
> >>> We investigated the logs in the regression failures that encountered
> >>> this and following are the findings:
> >>> 1. snapshot clone failure is indeed the reason for the failure.
> >>> 2. snapshot clone has failed in pre-validation with the error that the
> >>> brick of snap3 is not up and running.
> >>> 3. snap3 was created, and subsequently started (because of
> >>> activate-on-create being enabled), long before we tried to create a
> >>> clone out of it.
> >>> 4. The snap3's brick shows no failure logs, and thereby gives us no
> >>> reason to believe that it did not start properly in the course of the
> >>> testcase.
> >>> 5. Which leaves us with the assumption (it is an assumption because we
> >>> do not have any logs backing it) that, there was some delay in either
> >>> the start of the brick process for snap3, or for glusterd to register
> >>> that the same has started, and before either of these events could have
> >>> happened the clone command got executed and failed. This would make 
> >>> it a
> >>> race.
> >>>
> >>> Some other things to consider about the particular testcase:
> >>> 1. It did pass (and still passes consistently), in our local systems
> >>> making it not reproducible locally.
> >>> 2. The patch was merged after both linux and netbsd regressions passed
> >>> (at one go).
> >>> 3. The release 3.7 backported patch for the same, has also passed both
> >>> the linux and netbsd regressions as of now.
> >>>
> >>> The rationale behind mentioning the above three points being, this
> >>> testcase has passed locally, as well as on the regression setups(not
> >>> just at the time of merge, but even now), which brings me back to the
> >>> assumption mentioned in point #5 . To get more clarity on the said
> >>> assumption we need access to one of the regression setups, so that we
> >>> can try reproducing the failure in that environment and get some proof
> >>> of what really is happening.
> >>>
> >>> Vijay,
> >>>
> >>> Could you please provide us with a jenkins linux slave to perform the
> >>> above mentioned validity
> >>>
> >>
> >> Please send out a request on gluster-infra if not done so and Michael 
> >> Scherer should be able to help.
> >>
> >> Thanks!
> >> Vijay
> >>
> > + Adding gluster-infra and Michael
> >
> > Could you please provide us with a jenkins linux slave to perform the 
> > above mentioned validity

So you just want 1 single centos 6 gluster slave, who need access to it,
and for how long ?

Can you provides a ssh key so I can create a snapshot and give to you ?

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://www.gluster.org/pipermail/gluster-infra/attachments/20151105/06010adf/attachment-0001.sig>