[Gluster-devel] Performance Translators' Stability and Usefulness - Regression test outline

Tue Jul 7 22:30:01 UTC 2009

Hi Mickey,

> Wow, you really hit my biggest fear, the one thing I try to test for...
> data corruption.

Yep, it's mine too. Hence all this ranting and raving of mine :)

> I'm doing a simplified version of the first set of testing you mentioned

Sounds good :)

> I would add a few but I haven't had the time to google how to do the
> following without writing a C prog:
> check flock()
> check mmap writing

Perl and Python can do these, but I'd rather not script in Perl if I can 
possibly avoid it. (Python's much nicer :)

>  I'm thinking a simple collection of bash or perl scripts would work for
> a first pass at this. Do you have any suggestions on a good colab site
> for scripting?  If we came up with a basic format we could create and
> then mix and match them as we saw fit. We just need them all to be
> called with the same args, then have a master run that executes all of
> them in a tests dir.

The devs claim to have an unpublished, incomplete framework underway. I've 
asked for this to be published - no matter how incomplete it may be. 

Maybe we could take some cues from that, once we get a look at it?

(After all, we want the devs to ultimately use what we produce. We should use 
the same underlying tools.)

As for a collaboration place - there's the GlusterFS traditional location for 
random code dumps - http://glusterfs.pastebin.com

> It would also be nice if there was a sort of
> standard output both for giving to devels as well as rolling up nicely
> if we get 1E3 of these things.

Agreed. I've been using PyUnit a lot lately (I've become more of a Python 
programmer than a C programmer these days) so anything that aggregates test 
suites and produces results in a similar manner wins big according to me.

Geoff.

On Wed, 8 Jul 2009, Mickey Mazarick wrote:
> Wow, you really hit my biggest fear, the one thing I try to test for...
> data corruption.
> That's what I wake up afraid of at night...
>
> I'm doing a simplified version of the first set of testing you mentioned
> but nothing as detailed. Really creating a random file and doing an md5
> check on it, but now that you mention all the possabilities of files
> moving in from the back end I'm really doing nothing to test the dht or
> namespace distrobution at all....
> I would add a few but I haven't had the time to google how to do the
> following without writing a C prog:
> check flock()
> check mmap writing
> Also I have yet to get this to work all the time but starting a large
> write, and losing a brick under afr.. (usually term the write)
>
>  I'm thinking a simple collection of bash or perl scripts would work for
> a first pass at this. Do you have any suggestions on a good colab site
> for scripting?  If we came up with a basic format we could create and
> then mix and match them as we saw fit. We just need them all to be
> called with the same args, then have a master run that executes all of
> them in a tests dir.  It would also be nice if there was a sort of
> standard output both for giving to devels as well as rolling up nicely
> if we get 1E3 of these things.
>
> -Mic
>
> Geoff Kassel wrote:
> > Hi Mickey,
> >
> >> Thanks I am well versed in unit testing but probably disagree on level
> >> of use in a development cycle. Instead of writing a long email back
> >> about testing theory, nondeterministic problems, highly connected
> >> dependent systems blah blah
> >
> > Sorry, I was just trying to make sure we were all on the same page -
> > define some common terminology, etc for anyone else who wanted to join
> > in.
> >
> > I'm well aware of the limits of testing, having most of a PhD in related
> > formal methods topics and having taught Uni subjects in this area. (But
> > consider me optimistic anyway :)
> >
> > It's just about improving confidence, after all. Not about achieving some
> > nebulous notion of perfection.
> >
> >> I'll just say that most of the problems that
> >> have plagued me have been because of interactions between translators,
> >> kernel mods etc which unit testing doesn't really approach.
> >
> > That's the focus of integration testing, not unit tests... I did mention
> > integration testing.
> >
> >> Since I'm running my setup as a storage farm it just doesn't matter to
> >> me if there's a memory leak of if a server daemon crashes, I have cron
> >> jobs that restart it and I barely take notice.
> >
> > You're very lucky that a crash doesn't cause you much annoyance. My
> > annoyances in this area are well documented in the list, so I won't
> > repeat them again :)
> >
> >> I would rather encourage the dev team to add hotadd
> >> upgrade and hotadd features. These things would keep my cluster going
> >> even if there were catastrophic problems.
> >
> > These are good features to have, yes. However, I'd like to make sure
> > there's something incorrupted to recover first.
> >
> > If a feature freeze was necessary to get a proper QA framework put in
> > place and working towards avoiding more data corruption bugs, then I
> > would vote for the feature freeze over more features, no matter how
> > useful.
> >
> >> What I'm saying is that a good top down testing system is something we
> >> can discuss here, spec out and perhaps create independently of the
> >> development team. I think what most people want is a more stable product
> >> and I think a top down approach will get it there faster than trying to
> >> implement a given UT system from the bottom up. It will defiantly answer
> >> the question "should I upgrade to this release?"
> >
> > Alright. We'll let the devs concentrate on bottom up testing (they know
> > the code better anyway), and we in the wider community can look at top
> > down testing.
> >
> >> You mentioned that you had outlines some integration and function tests
> >> previously, perhaps you could paste some into this thread so that we
> >> could expand on them.
> >
> > Okay. The test I outlined was for checking for data corruption bugs for
> > AFR and Unity with cryptographic hashes. The idea actually expands into a
> > class of test cases. I'll flesh those out a bit more now.
> >
> > Generate a number of files of varying length (zero size, single byte,
> > transfer block size - 1, transfer block size, transfer block size + 1,
> > multiple meg, multiple gig etc) in a directory tree of varying depths.
> > Take the cryptographic hash of each file.
> >
> > One test can be starting with an empty set of GlusterFS back end data
> > blocks. Insert the files and directories through the client - check the
> > hashes of the files stored on the back ends, and as read back through
> > each of the client(s). If the hashes mismatch the original computed
> > hashes at any point, the test has failed.
> >
> > Another test can be starting with the files already on the back end. (But
> > without having had Gluster assign metadata attributes yet.) Start the
> > server, read the files through each of the client(s) and directly from
> > the back end. As before, if the hashes mismatch at any point - failure.
> >
> > A third test - start another set of back ends with a partially populated
> > back end. Start the server, read the existing files off, compare hashes.
> > Add the remaining files. Compare the hashes of all files through the
> > client(s), and as they end up on the back end.
> >
> > I don't know if 2.0.x Gluster supports this any more, but you used to be
> > able to have one back end populated and the other empty, so long as a
> > namespace block on all servers had zero-length file entries for all of
> > the replicated files. (This being how you could add a node to your
> > cluster originally.) Start back ends in this one populated, others empty
> > configuration - read all the files through from a client connected only
> > to a server with an empty back end. Check the hashes read through the
> > client, and the hashes of the files that end up 'healed' onto the
> > formerly empty back ends.
> >
> > Then there's a multitude of overwrite tests that could be done in this
> > vein, as well as concurrent read and write tests to check atomicity etc.
> >
> > All these tests could be done under different performance translators,
> > with different numbers of servers and clients. All just a matter of
> > different configuration files given, and different scripts to set up
> > different test environments.
> >
> > All of these functional tests can be automated, can be done on a single
> > system with some clever configuration files, or performed across a
> > network to try to detect issues caused by networking.
> >
> > (I believe there are open source network simulation tools that might be
> > able to be used to simulate lag, noise, congestion etc, and so reduce
> > this network testing to being run on a single machine. Network simulation
> > is not an area of expertise for me, so I don't know how effective or
> > comparable this is to the real thing.)
> >
> > If the files in the tests are algorithmically generated (say, sourced
> > from a pseudo random number generator, or the various patterns favoured
> > by memory testers), the back end test data sets can be quite small in
> > size.
> >
> > (Hopefully this will all be small enough to add to the repository without
> > adding much bulk to a check out.)
> >
> > What do you think?
> >
> > Geoff.
> >
> > On Wed, 8 Jul 2009, Mickey Mazarick wrote:
> >> Geoff,
> >> Thanks I am well versed in unit testing but probably disagree on level
> >> of use in a development cycle. Instead of writing a long email back
> >> about testing theory, nondeterministic problems, highly connected
> >> dependent systems blah blah I'll just say that most of the problems that
> >> have plagued me have been because of interactions between translators,
> >> kernel mods etc which unit testing doesn't really approach.
> >>
> >> Since I'm running my setup as a storage farm it just doesn't matter to
> >> me if there's a memory leak of if a server daemon crashes, I have cron
> >> jobs that restart it and I barely take notice. True a regression testing
> >> would get rid of the memory leak you hate but if they have to start from
> >> the ground up I would rather encourage the dev team to add hotadd
> >> upgrade and hotadd features. These things would keep my cluster going
> >> even if there were catastrophic problems.
> >>
> >> What I'm saying is that a good top down testing system is something we
> >> can discuss here, spec out and perhaps create independently of the
> >> development team. I think what most people want is a more stable product
> >> and I think a top down approach will get it there faster than trying to
> >> implement a given UT system from the bottom up. It will defiantly answer
> >> the question "should I upgrade to this release?"
> >>
> >> You mentioned that you had outlines some integration and function tests
> >> previously, perhaps you could paste some into this thread so that we
> >> could expand on them.
> >>
> >> Thanks!
> >> -Mickey Mazarick
> >>
> >> Geoff Kassel wrote:
> >>> Hi Mickey,
> >>>    Just so that we're all on the same page here - a regression test
> >>> suite at its most basic just has to include test cases (i.e. a set of
> >>> inputs) that can trigger a previously known fault in the code if that
> >>> fault is present. (i.e it can see if the code has 'regressed' into a
> >>> condition where a fault is present.)
> >>>
> >>>    What it's also taken to mean (and typically includes) is a set of
> >>> tests cases covering corner cases and normal modes of operation, as
> >>> expressed in a set of inputs to code paired with a set of expected
> >>> outputs that may or may not include error messages.
> >>>
> >>>    Test cases aimed at particular levels of the code have specific
> >>> terminology associated with those levels. At the lowest level, the
> >>> method level, they're called unit tests. At the module/API level -
> >>> integration tests. At the system/user interface level - system aka
> >>> function aka functional aka functionality tests.
> >>>
> >>>    When new functionality is introduced or a bug is patched, the
> >>> regression test suite (which in the case of unit tests is typically
> >>> fully automated) is run to see whether the expected behaviour occurs,
> >>> and none of the old faults recur.
> >>>
> >>>    A lot of the tests you've described fall into the category of
> >>> function tests - and from my background in automated testing, I know we
> >>> need a bit more than that to get the stability and reliability results
> >>> we want. (Simply because you cannot test every corner case within a
> >>> project the size and complexity of GlusterFS reliably from the command
> >>> line.)
> >>>
> >>>    Basically, what GlusterFS needs is a fairly even coverage of test
> >>> cases at all the levels I've just mentioned.
> >>>
> >>>    What I want to see particularly - and what the devs stated nearly a
> >>> year ago was already in existence - is unit tests. Particularly the
> >>> kind that can be run automatically.
> >>>
> >>>    This is so that developers (inside the GlusterFS team or otherwise)
> >>> can hack on a piece of code to fix a bug or implement new
> >>> functionality, then run the unit tests to see that they (mostly likely)
> >>> haven't caused a regression with their new code.
> >>>
> >>>    (It's somewhat difficult for outsiders to write unit and integration
> >>> tests, because typically only the original developers have the in-depth
> >>> knowledge of the expected behaviour of the code in the low level detail
> >>> required.)
> >>>
> >>>    Perhaps developed in parallel should be integration and function
> >>> tests. Tests like these (I've outlined elsewhere specifically what
> >>> kind) would have quite likely picked up the data corruption bugs before
> >>> they made their way into the first 2.0.x releases.
> >>>
> >>>    (Pretty much anyone familiar with the goal of the project can write
> >>> function tests, documenting in live code their expectations for how the
> >>> system should work.)
> >>>
> >>>    Long running stability and load tests like you've proposed are also
> >>> kinds of function tests, but without the narrowly defined inputs and
> >>> outputs of specific test cases. They're basically the equivalent of
> >>> mine shaft canaries - they signal the presence of race conditions,
> >>> memory leaks, design flaws, and other subtle issues, but often without
> >>> specifics as to what 'killed' the canary. Once the cause is found
> >>> though, a new, more specific test case can be added at the appropriate
> >>> level.
> >>>
> >>>    (Useful, yes, but mostly as a starting point for more intensive QA
> >>> efforts.)
> >>>
> >>>    The POSIX compliance tests you mentioned are more traditional
> >>> function level tests - but I think the GlusterFS devs have wandered a
> >>> little away from full POSIX compliance on some points, so these tests
> >>> may not be 100% relevant.
> >>>
> >>>    (This is not necessarily a bad thing - the POSIX standard is
> >>> apparently ambiguous at times, and there is some wider community
> >>> feeling that improvements to the standard are overdue. And I'm not sure
> >>> the POSIX standard was ever written with massively scalable, plugable,
> >>> distributed file systems in mind, either :)
> >>>
> >>>    I hope my extremely long winded rant here :) has explained
> >>> adequately what I feel GlusterFS needs to have in a regression testing
> >>> system.
> >>>
> >>> Geoff.
> >>>
> >>> On Tue, 7 Jul 2009, Mickey Mazarick wrote:
> >>>> What kind of requirements does everyone see as necessary for a
> >>>> regression test system?
> >>>> Ultimately the best testing system would use the tracing translator
> >>>> and be able to run tests and generate traces for any problems that
> >>>> occurs, giving us something very concrete to provide the developers.
> >>>> That's a few steps ahead however, initially we should start to outline
> >>>> some must haves in terms of how a test setup is run. obviously we want
> >>>> something we can run for many hours or days to test longterm
> >>>> stability, and it would be nice if there was some central way to spin
> >>>> up new clients to test reliability under a load.
> >>>>
> >>>> For basic file operation tests I use the below:
> >>>> An initial look would be to use some tools like
> >>>> http://www.ntfs-3g.org/pjd-fstest.html
> >>>> I've seen it mentioned before but it's a good start to test anything
> >>>> posix. Here's a simple script that will download and build it if it's
> >>>> missing, and run a test on a given mount point.
> >>>>
> >>>>
> >>>> #!/bin/bash
> >>>> if [ "$#" -lt 1 ]
> >>>> then
> >>>>   echo "usage: $0 gluster_mount"
> >>>>   exit 65
> >>>> fi
> >>>> GLUSTER_MOUNT=$1
> >>>> INSTALL_DIR="/usr"
> >>>> if [ ! -d $INSTALL_DIR/fstest ]; then
> >>>>   cd $INSTALL_DIR
> >>>>   wget http://www.ntfs-3g.org/sw/qa/pjd-fstest-20080816.tgz
> >>>>   tar -xzf pjd-fstest-20080816.tgz
> >>>>   mv pjd-fstest-20080816 fstest
> >>>>   cd fstest
> >>>>   make
> >>>>   vi tests/conf
> >>>> fi
> >>>> cd $GLUSTER_MOUNT
> >>>> prove -r $INSTALL_DIR/fstest/
> >>>>
> >>>> Jacques Mattheij wrote:
> >>>>> hello Anand, Geoff & others,
> >>>>>
> >>>>> This pretty much parallels my interaction with the team about a
> >>>>> year ago, lots of really good intentions but no actual follow up.
> >>>>>
> >>>>> We agreed that an automated test suite was a must and that a
> >>>>> whole bunch of other things would have to be done to get
> >>>>> glusterfs out of the experimental stage and into production
> >>>>> grade.
> >>>>>
> >>>>> It's a real pity because I still feel that glusterfs is one of the
> >>>>> major contenders to become *the* cluster file system.
> >>>>>
> >>>>> A lot of community goodwill has been lost, I've kept myself
> >>>>> subscribed to this mailing list because I hoped that at some
> >>>>> point we'd move past this endless cat and mouse game with
> >>>>> stability issues but for some reason that never happend.
> >>>>>
> >>>>> Anand, you have a very capable team of developers, you have
> >>>>> a once-in-a-lifetime opportunity to make this happen please
> >>>>> take Geoff's comments to hart and get serious about Q&A and
> >>>>> community support because that is the key to any successful
> >>>>> foss project. Fan that fire and you can't go wrong, lose the
> >>>>> community support and your project might as well be dead.
> >>>>>
> >>>>> I realize this may come across as harsh but it is intended to
> >>>>> make it painfully obvious that the most staunch supporters
> >>>>> of glusterfs are getting discouraged and that is a loss no
> >>>>> serious project can afford.
> >>>>>
> >>>>>  Jacques
> >>>>>
> >>>>> Geoff Kassel wrote:
> >>>>>> Hi Anand,
> >>>>>>    If you look back through the list archives, no one other than me
> >>>>>> replied to the original QA thread where I first posted my patches.
> >>>>>> Nor to the Savannah patch tracker thread where I also posted my
> >>>>>> patches. (Interesting how those trackers have been disabled now...)
> >>>>>>
> >>>>>>    It took me pressing the issue after discovering yet another bug
> >>>>>> that we even started talking about my patches. So yes, my patches
> >>>>>> were effectively ignored.
> >>>>>>
> >>>>>>    At the time, you did mention that the code the patches were to be
> >>>>>> applied against was being reworked, in addition to your comments
> >>>>>> about my code comments.
> >>>>>>
> >>>>>>    I explained the comments as being necessary to avoid the
> >>>>>> automated tool flagging potential issues again on reuse of that tool
> >>>>>> - other comments for future QA work. There was no follow up on that
> >>>>>> from you, nor suggestion on how I might improve these comments to
> >>>>>> your standards.
> >>>>>>
> >>>>>>    I continued to supply patches in the Savannah tracker against the
> >>>>>> latest stable 1.3 branch - which included some refactoring for your
> >>>>>> reworked code, IIRC - for some time after that discussion. All of my
> >>>>>> patches were in sync with the code from publically available 1.3
> >>>>>> branch repository within days of a new TLA patchset.
> >>>>>>
> >>>>>>    None of these were adopted either.
> >>>>>>
> >>>>>>    I simply ran out of spare time to maintain this patchset, and I
> >>>>>> got tired of pressing an issue (QA) that you and the dev team
> >>>>>> clearly weren't interested in.
> >>>>>>
> >>>>>>    I don't have the kind of spare time needed to do the sort of
> >>>>>> in-depth re-audit your code from scratch (as would be needed) in the
> >>>>>> manner that I did back then. So I can't meet your request at this
> >>>>>> time, sorry.
> >>>>>>
> >>>>>>    As I've suggested elsewhere, now that you apparently have the
> >>>>>> resources for a stand-alone QA team - this team might want to at
> >>>>>> least use the tools I've used to generate these patches - RATS and
> >>>>>> FlawFinder.
> >>>>>>
> >>>>>>    That way you can generate the kind of QA work I was producing
> >>>>>> with the kind of comment style you prefer.
> >>>>>>
> >>>>>>    The only way I can conceive of being able to help now is in
> >>>>>> patching individual issues. However, I can really only feasibly do
> >>>>>> that with my time constraints if I've got regression tests to make
> >>>>>> sure I'm not inadvertently breaking other functionality.
> >>>>>>
> >>>>>>    Hence my continued requests for these.
> >>>>>>
> >>>>>> Geoff.
> >>>>>>
> >>>>>> On Tue, 7 Jul 2009, Anand Avati wrote:
> >>>>>>>>   I've also gone one better than just advice - I've given up
> >>>>>>>> significant
> >>>>>>>> portions of my limited spare time to audit and patch a
> >>>>>>>> not-insignificant
> >>>>>>>> portion of the GlusterFS code, in order to deal with the stability
> >>>>>>>> issues
> >>>>>>>> I and others were encountering. My patches were ignored, on the
> >>>>>>>> grounds
> >>>>>>>> that it contained otherwise unobtrusive comments which were quite
> >>>>>>>> necessary to the audit.
> >>>>>>>
> >>>>>>> Geoff, we really appreciate your efforts, both on the fronts of
> >>>>>>> your patch submissions and for voicing your opinions freely. We
> >>>>>>> also acknowledge the positive intentions behind this thread. As far
> >>>>>>> as your patch submissions are concerned, there is probably a
> >>>>>>> misunderstanding. Your patches were not ignored. We do value your
> >>>>>>> efforts. The patches which you submitted, even at the time of your
> >>>>>>> submission were not applicable to the codebase.
> >>>>>>>
> >>>>>>> Patch 1 (in glusterfsd.c) -- this file was reworked and almost
> >>>>>>> rewritten from scratch to work as both client and server.
> >>>>>>>
> >>>>>>> Patch 2 (glusterfs-fuse/src/glusterfs.c) -- this module was
> >>>>>>> reimplemented as a new translator (since a separate client was no
> >>>>>>> more needed).
> >>>>>>>
> >>>>>>> Patch 3 (protocol.c) -- with the introduction of non blocking IO
> >>>>>>> and binary protocol, nothing of this file remained.
> >>>>>>>
> >>>>>>> What I am hoping to convey is that, the reason your patches did not
> >>>>>>> make it to the repository was because it needed significant
> >>>>>>> reworking to even apply. I did indeed comment about code comments
> >>>>>>> of the style /* FlawFinder: */ but then, that definitely was _not_
> >>>>>>> the reason they weren't included. Please understand that nothing
> >>>>>>> was ignored intentionally.
> >>>>>>>
> >>>>>>> This being said, I can totally understand the efforts which you
> >>>>>>> have been putting to maintain patchsets by yourself and keeping
> >>>>>>> them up to date with the repository. I request you to resubmit them
> >>>>>>> (with git format-patch) against the HEAD of the repository.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Avati
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Gluster-devel mailing list
> >>>>>> Gluster-devel at nongnu.org
> >>>>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel