[Gluster-devel] Need help diagnosing regression-test crashes

Jeff Darcy jdarcy at redhat.com
Fri Apr 8 20:35:27 UTC 2016


Upon further investigation, I've been able to determine that the problem
lies in this line of our generic cleanup routine.

        type cleanup_lvm &>/dev/null && cleanup_lvm || true;

This works great if snapshot.rc we're at the end of a test that included
snapshot.rc (which defines cleanup_lvm), but we've generally been moving
away from that in favor of calling it only at the beginning.  Thus, when
we go from a snapshot test to a non-snapshot test, the cleanup at the
beginning of the latter does *not* clean up any LVM stuff that's left
over.  What might have been a simple and correctly attributed failure in
the snapshot test can instead show up later.  In this case, the sequence
of events is as follows:

 1) bug-1322772 (snapshot) test starts glusterd

 2) bug-1322772 exits while the new glusterd is still initializing

 3) run-tests.sh looks for new core files and finds none

 4) run-tests.sh starts bug-1002207 (stripe) test

 5) glusterd from bug-1322772 dumps core

 6) bug-1002207 test completes

 7) run-tests.sh sees new core and misattributes it to bug-1002207

The question is what to do about this.  Unconditionally calling
lvm_cleanup from generic cleanup is simple, but might make regression
tests noticeably slower.  Another possibility would be to change all
snapshot tests to call cleanup (or at least cleanup_lvm) at the end, or
use bash's "trap" mechanism to ensure the same.  I'm not wild about any
of those, but lean toward the "trap" approach.  Anyone else have any
opinions?


More information about the Gluster-devel mailing list