[Gluster-devel] quota.t hangs on NetBSD machines
Emmanuel Dreyfus
manu at netbsd.org
Thu Dec 31 11:01:37 UTC 2015
On Thu, Dec 31, 2015 at 03:40:54PM +0530, Raghavendra Talur wrote:
We have threads sleeping, either voluntary (nanosleep) or not (lwp_park),
and this:
c5223a80 (glusterfs) is in
sleepq_block/cv_timedwait_sig/sbwait/soreceive/soo_read/do_filereadv/sys_readv
Awaiting while reading on a socket. Probably FUSE, but it would be nice
to be certain.
c5346540 (glusterfs) is in
sleepq_block/cv_timedwait_sig/sigtimedwait1/sys_____sigtimedwait50
This is ordinary sigtimedwait() but the timeout arguent (third) is zero,
which can let it sleep forever. Is it expected?
> cv_timedwait_sig(c53466b4,c5004b80,0,c53466a4,3,db727e90,c53466a4,c41eb528,db727eac,7ff0)
c5418020 (glusterfs) is in
sleepq_block/sel_do_scan/pollcommon/sys_poll
This is orinary poll(2). The struct timespec for the timeout is at
db721f18 and again this is an infinite timeout;
crash> x db721f18,2
db721f18: 0 0
(NB: 2 words because we run a a 32 bit machine, struct timespec is a
32 bit time_t and a 32 bit long)
c53692c0 (perfused) is in
sleepq_block/cv_timedwait_sig/kevent1/sys___kevent50
Awaiting for data (either from kernel or glusterfs, I do not know).
Again we have an inifinite timeout.
I note that the FUSE filesystem is responding. Since perfused is
not multithreaded, it suggests it is not the stuck process. It may
have missed a request or reply, though, which would stuck the calling
process.
Speaking about the calling process. I beleive it is the quota utility?
Indeed awaiting for a reply from the filesystem:
UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 15221 1406 1546 85 0 3360 1080 puffsrpl I pts/0- 0:00.06 tests/basic/quota /mnt/glusterfs/0/test_dir/1.txt 256 48
Here is its backtrace obtained from gdb:
#0 0xbb69b6f7 in write () from /usr/lib/libc.so.12
#1 0x080489c0 in nwrite (fd=3, buf=0xbb501000, count=262144)
at tests/basic/quota.c:16
#2 0x08048a8b in file_write (
filename=0xbf7ffcb2 "/mnt/glusterfs/0/test_dir/1.txt", bs=262144, count=48)
at tests/basic/quota.c:48
#3 0x08048b64 in main (argc=4, argv=0xbf7feba0) at tests/basic/quota.c:83
It is awaiting for a write to complete, but we still do not know what process
got the request and not the reply. Do you see any way to tell?
--
Emmanuel Dreyfus
manu at netbsd.org
More information about the Gluster-devel
mailing list