[Gluster-devel] NetBSD hanging regression tests
Emmanuel Dreyfus
manu at netbsd.org
Sat Mar 7 05:06:50 UTC 2015
Hi
Recently NetBSD regression tests started hanging quite frequently. Here is
an example:
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/1679/
The offending test is root-squash-self-heal.t which starts a never-ending
glfsheal process:
PID LID WCHAN STAT LTIME COMMAND
28554 5 parked Rl 0:00.04 /build/install/sbin/glfsheal patchy
28554 4 nanoslp Rl 0:01.28 /build/install/sbin/glfsheal patchy
28554 3 - Rl 0:00.00 /build/install/sbin/glfsheal patchy
28554 1 - Rl 754:21.27 /build/install/sbin/glfsheal patchy
Thread 1 ate a lot of CPU time. It is looping or failed writes:
28554 1 glfsheal CALL __gettimeofday50(0xbf7fe650,0)
28554 1 glfsheal RET __gettimeofday50 0
28554 1 glfsheal CALL write(9,0xbb7c63fb,6)
28554 1 glfsheal RET write -1 errno 35 Resource temporarily
unavailable
Running a standalone glfsheal process shows it first writes "dummy" before
it hits the same error. This suggests we are in event_dispatch_destroy():
/* Write to pipe(fd[1]) and then wait for 1 second or until
* a poller thread that is dying, broadcasts.
*/
while (event_pool->activethreadcount > 0) {
write (fd[1], "dummy", 6);
sleep_till.tv_sec = time (NULL) + 1;
ret = pthread_cond_timedwait (&event_pool->cond,
&event_pool->mutex,
&sleep_till);
}
Obviously something went wrong. Perhaps there should be a timeout there,
and/or a check that write() does not fail?
diff --git a/libglusterfs/src/event.c b/libglusterfs/src/event.c
index f19d43a..b956d25 100644
--- a/libglusterfs/src/event.c
+++ b/libglusterfs/src/event.c
@@ -235,10 +235,14 @@ event_dispatch_destroy (struct event_pool *event_pool)
pthread_mutex_lock (&event_pool->mutex);
{
/* Write to pipe(fd[1]) and then wait for 1 second or until
- * a poller thread that is dying, broadcasts.
+ * a poller thread that is dying, broadcasts. Make sure we
+ * do not loop forever by limiting to 10 retries
*/
- while (event_pool->activethreadcount > 0) {
- write (fd[1], "dummy", 6);
+ int retry = 0;
+
+ while (event_pool->activethreadcount > 0 && retry++ < 10) {
+ if (write (fd[1], "dummy", 6) == -1)
+ break;
sleep_till.tv_sec = time (NULL) + 1;
ret = pthread_cond_timedwait (&event_pool->cond,
&event_pool->mutex,
--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu at netbsd.org
--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu at netbsd.org
More information about the Gluster-devel
mailing list