[Gluster-devel] quota.t hangs on NetBSD machines

Raghavendra Gowdappa rgowdapp at redhat.com
Thu Dec 31 11:07:33 UTC 2015



----- Original Message -----
> From: "Emmanuel Dreyfus" <manu at netbsd.org>
> To: "Raghavendra Talur" <rtalur at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Thursday, December 31, 2015 4:31:37 PM
> Subject: Re: [Gluster-devel] quota.t hangs on NetBSD machines
> 
> On Thu, Dec 31, 2015 at 03:40:54PM +0530, Raghavendra Talur wrote:
> 
> We have threads sleeping, either voluntary (nanosleep) or not (lwp_park),
> and this:
> 
> c5223a80 (glusterfs) is in
> sleepq_block/cv_timedwait_sig/sbwait/soreceive/soo_read/do_filereadv/sys_readv
> Awaiting while reading on a socket. Probably FUSE, but it would be nice
> to be certain.
> 
> c5346540 (glusterfs) is in
> sleepq_block/cv_timedwait_sig/sigtimedwait1/sys_____sigtimedwait50
> This is ordinary sigtimedwait() but the timeout arguent (third) is zero,
> which can let it sleep forever. Is it expected?
> > cv_timedwait_sig(c53466b4,c5004b80,0,c53466a4,3,db727e90,c53466a4,c41eb528,db727eac,7ff0)
> 
> c5418020 (glusterfs) is in
> sleepq_block/sel_do_scan/pollcommon/sys_poll
> This is orinary poll(2). The struct timespec for the timeout is at
> db721f18 and again this is an infinite timeout;
> crash> x db721f18,2
> db721f18:       0           0
> (NB: 2 words because we run a a 32 bit machine, struct timespec is a
> 32 bit time_t and a 32 bit long)
> 
> c53692c0 (perfused) is in
> sleepq_block/cv_timedwait_sig/kevent1/sys___kevent50
> Awaiting for data (either from kernel or glusterfs, I do not know).
> Again we have an inifinite timeout.
> 
> I note that the FUSE filesystem is responding. Since perfused is
> not multithreaded, it suggests it is not the stuck process. It may
> have missed a request or reply, though, which would stuck the calling
> process.
> 
> Speaking about the calling process. I beleive it is the quota utility?
> Indeed awaiting for a reply from the filesystem:
> UID   PID PPID  CPU PRI NI  VSZ  RSS WCHAN    STAT TTY       TIME COMMAND
>   0 15221 1406 1546  85  0 3360 1080 puffsrpl I    pts/0- 0:00.06
>   tests/basic/quota /mnt/glusterfs/0/test_dir/1.txt 256 48
> 
> Here is its backtrace obtained from gdb:
> #0  0xbb69b6f7 in write () from /usr/lib/libc.so.12
> #1  0x080489c0 in nwrite (fd=3, buf=0xbb501000, count=262144)
>     at tests/basic/quota.c:16
> #2  0x08048a8b in file_write (
>     filename=0xbf7ffcb2 "/mnt/glusterfs/0/test_dir/1.txt", bs=262144,
>     count=48)
>     at tests/basic/quota.c:48
> #3  0x08048b64 in main (argc=4, argv=0xbf7feba0) at tests/basic/quota.c:83
> 
> It is awaiting for a write to complete, but we still do not know what process
> got the request and not the reply. Do you see any way to tell?

We saw similar bt on test process. At that time we took statedump of client process. While we were going through statedump, surprisingly the test program resumed and completed.

Can you take statedump of client process?

> 
> --
> Emmanuel Dreyfus
> manu at netbsd.org
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 


More information about the Gluster-devel mailing list