[Gluster-devel] tests/basic/tier/tier.t failure in NetBSD

Sun Feb 28 13:00:42 UTC 2016

----- Original Message -----
> Hi All,
> 
> 1. record-metadata-heat.t is already added to bad test: As far as the feature
> goes its just test the switch
>    on the recording of metadata heat. We are aware of the issue here its not
>    the feature but the timing that is used in the test.
>    We will be working on it once our priority list of things are addressed.
> 2. I just had a look at the tier.t failure case. The tests fails at test 45.
> i.e detach command failure. We can have a look at it and will fix it.
> 3. Jeff could you please point us to the failure link of
> fops-during-migration-pause.t so that we can investigate.

https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14624

However, this one did fail the same way on CentOS, so it might actually have
been a real problem with the patch.  You should probably check with Rafi
about that.

> Now about tiering test files. Most of the times the tiering tests have failed
> due to the timing issues of checking status.
> i.e synchronizing when file gets promoted and demoted. and its very difficult
> to have this synchronized with the t file infra (tiering team can add more
> to this.)
> But just to tell that user data is unsafe on this premise would not be
> correct, as any data loss or corruption or unavailability is already handled
> in the t files of tiering.

That might turn out to be true upon further examination of each individual
failure, and that would be a relief.  On the other hand, we can't rely on
manual examination or ad-hoc decisions about which parts of a test matter.
That's just not a sustainable process.  *From a project perspective*, if
any part of a test fails then the whole test fails.  The whole ship floats
or sinks together.  If a test fails consistently, then - again, from a
*project* perspective - we can't make any assurance based on any part of
it, and we need such assurance to say a feature is safe.

If the only problem is these tests is status reporting, then let's work
together to make status reporting (within bounded time) more reliable.
Then we won't have to choose between losing information or losing time.