[Gluster-devel] Hung regression jobs

Justin Clift justin at gluster.org
Thu Apr 23 12:18:25 UTC 2015


On 23 Apr 2015, at 01:18, Jeff Darcy <jdarcy at redhat.com> wrote:
> I just had to clean up a couple of these - 7327 and 7331.  Fortunately,
> they both seem to have gone on their merry way instead of dying.  Both
> were in the pre-mount stage of their setup, but did have mounts active
> and gsyncd processes running (in one case multiple of them).  I suspect
> that this is related to the fact that the new geo-rep tests call "exit"
> directly instead of returning errors (see geo-rep-helpers.c:192) and
> don't use bash's "trap ... EXIT" functionality to ensure proper cleanup.
> Thus, whatever was mounted or running when they failed will remain
> mounted or running to trip up the next test.
> 
> If one of your regression jobs seems to be hung, either log in to the
> slave machine yourself or contact someone who can, so the offending
> mounts/processes can be unmounted/killed.

Ahhh yeah, this makes sense.  The scripting in Jenkins for launching
regression tests should probably be tweaked to also kill any left over
geo-rep stuff.

I'm focused elsewhere atm, so won't be looking at this myself.  But
anyone with a Jenkins login is able to.  Just muck around with the
script here to add geo-rep bits:

  http://build.gluster.org/job/rackspace-regression-2GB-triggered/

(remember to comment any chances, for traceability) :)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift



More information about the Gluster-devel mailing list