[Gluster-devel] tests/basic/tier/tier.t failure in NetBSD

Sun Feb 28 18:02:50 UTC 2016

----- Original Message -----
> From: "Jeff Darcy" <jdarcy at redhat.com>
> To: "Joseph Fernandes" <josferna at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>, "Atin Mukherjee" <amukherj at redhat.com>
> Sent: Sunday, February 28, 2016 6:30:42 PM
> Subject: Re: [Gluster-devel] tests/basic/tier/tier.t failure in NetBSD
> 
> 
> 
> ----- Original Message -----
> > Hi All,
> > 
> > 1. record-metadata-heat.t is already added to bad test: As far as the
> > feature
> > goes its just test the switch
> >    on the recording of metadata heat. We are aware of the issue here its
> >    not
> >    the feature but the timing that is used in the test.
> >    We will be working on it once our priority list of things are addressed.
> > 2. I just had a look at the tier.t failure case. The tests fails at test
> > 45.
> > i.e detach command failure. We can have a look at it and will fix it.
> > 3. Jeff could you please point us to the failure link of
> > fops-during-migration-pause.t so that we can investigate.
> 
> https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14624
> 
> However, this one did fail the same way on CentOS, so it might actually have
> been a real problem with the patch.  You should probably check with Rafi
> about that.

Thanks for sharing.

Sure, I will look into this issue.

> 
> > Now about tiering test files. Most of the times the tiering tests have
> > failed
> > due to the timing issues of checking status.
> > i.e synchronizing when file gets promoted and demoted. and its very
> > difficult
> > to have this synchronized with the t file infra (tiering team can add more
> > to this.)
> > But just to tell that user data is unsafe on this premise would not be
> > correct, as any data loss or corruption or unavailability is already
> > handled
> > in the t files of tiering.
> 
> That might turn out to be true upon further examination of each individual
> failure, and that would be a relief.  On the other hand, we can't rely on
> manual examination or ad-hoc decisions about which parts of a test matter.
> That's just not a sustainable process.  *From a project perspective*, if
> any part of a test fails then the whole test fails.  The whole ship floats
> or sinks together.  If a test fails consistently, then - again, from a
> *project* perspective - we can't make any assurance based on any part of
> it, and we need such assurance to say a feature is safe.
> 
> If the only problem is these tests is status reporting, then let's work
> together to make status reporting (within bounded time) more reliable.
> Then we won't have to choose between losing information or losing time.
> 

Well I will not agree on the part of "The whole ship floats or sinks together",
as in the past we have seen tests(not only with tiering but other features)
that were failing and were addressed and fixed.

But I agree on the restriction of time/effort on this. And thus I will review your 
patch of http://review.gluster.org/#/c/13535/ for disabling tiering on netbsd.

Regards,
Joe