[Gluster-devel] Need help diagnosing regression-test crashes

Sat Apr 9 04:27:07 UTC 2016

On 04/09/2016 12:17 AM, Atin Mukherjee wrote:
> -Atin
> Sent from one plus one
> On 09-Apr-2016 9:32 am, "Rajesh Joseph" <rjoseph at redhat.com
> <mailto:rjoseph at redhat.com>> wrote:
>  >
>  >
>  >
>  > On Sat, Apr 9, 2016 at 2:05 AM, Jeff Darcy <jdarcy at redhat.com
> <mailto:jdarcy at redhat.com>> wrote:
>  >>
>  >> Upon further investigation, I've been able to determine that the problem
>  >> lies in this line of our generic cleanup routine.
>  >>
>  >>         type cleanup_lvm &>/dev/null && cleanup_lvm || true;
>  >>
>  >> This works great if snapshot.rc we're at the end of a test that included
>  >> snapshot.rc (which defines cleanup_lvm), but we've generally been moving
>  >> away from that in favor of calling it only at the beginning.  Thus, when
>  >> we go from a snapshot test to a non-snapshot test, the cleanup at the
>  >> beginning of the latter does *not* clean up any LVM stuff that's left
>  >> over.  What might have been a simple and correctly attributed failure in
>  >> the snapshot test can instead show up later.  In this case, the sequence
>  >> of events is as follows:
>  >>
>  >>  1) bug-1322772 (snapshot) test starts glusterd
>  >>
>  >>  2) bug-1322772 exits while the new glusterd is still initializing
>  >>
>  >>  3) run-tests.sh looks for new core files and finds none
>  >>
>  >>  4) run-tests.sh starts bug-1002207 (stripe) test
>  >>
>  >>  5) glusterd from bug-1322772 dumps core
>  >>
>  >>  6) bug-1002207 test completes
>  >>
>  >>  7) run-tests.sh sees new core and misattributes it to bug-1002207
>  >>
>  >> The question is what to do about this.  Unconditionally calling
>  >> lvm_cleanup from generic cleanup is simple, but might make regression
>  >> tests noticeably slower.  Another possibility would be to change all
>  >> snapshot tests to call cleanup (or at least cleanup_lvm) at the end, or
>  >> use bash's "trap" mechanism to ensure the same.  I'm not wild about any
>  >> of those, but lean toward the "trap" approach.  Anyone else have any
>  >> opinions?
>  >
>  >
>  > I think each snapshot test script should call cleanup_lvm and trap is a
>  > great suggestion.
>  >
>  > atinm: Can you please look into the crash in the following test case?
>  > bugs/snapshot/bug-1322772-real-path-fix-for-snapshot.t
>
> Do we have the link to the crash?

OT - Possibly unrelated glusterd crash in mainline [1]. This needs some 
attention too.

-Vijay

[1] http://www.gluster.org/pipermail/maintainers/2016-April/000619.html