[Gluster-devel] spurios failures in tests/encryption/crypt.t

Pranith Kumar Karampuri pkarampu at redhat.com
Wed May 21 05:07:37 UTC 2014



----- Original Message -----
> From: "Edward Shishkin" <edward at redhat.com>
> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> Cc: "Vijay Bellur" <vbellur at redhat.com>, "Anand Avati" <aavati at redhat.com>, "Gluster Devel"
> <gluster-devel at gluster.org>
> Sent: Monday, May 19, 2014 6:05:02 PM
> Subject: Re: spurios failures in tests/encryption/crypt.t
> 
> On Sat, 17 May 2014 04:28:45 -0400 (EDT)
> Pranith Kumar Karampuri <pkarampu at redhat.com> wrote:
> 
> > hi,
> >      crypt.t is failing regression builds once in a while and most of
> > the times it is because of the failures just after the remount in the
> > script.
> > 
> > TEST rm -f $M0/testfile-symlink
> > TEST rm -f $M0/testfile-link
> > 
> > Both of these are failing with ENOTCONN. I got a chance to look at
> > the logs. According to the brick logs, this is what I see:
> > [2014-05-17 05:43:43.363979] E [posix.c:2272:posix_open]
> > 0-patchy-posix: open on /d/backends/patchy1/testfile-symlink:
> > Transport endpoint is not connected
> > 
> > This is the very first time I saw posix failing with ENOTCONN. Do we
> > have these bricks on some other network mounts? I wonder why it fails
> > with ENOTCONN.
> > 
> > I also see that it happens right after a call_bail on the mount.
> > 
> > Pranith
> 
> Hello.
> OK, I'll try to reproduce it.

I tried re-creating the issue on my fedora VM and it happened just now. When this issue happens I am not able to attach the process to gdb. From /proc/ the threads are in the following state for a while now:
root at pranith-vm1 - /proc/4053/task 
10:20:50 :) ⚡ for i in `ls`; do cat $i/stack; echo "---------------------------------"; done
[<ffffffff811ed8ce>] ep_poll+0x21e/0x330
[<ffffffff811ee7b5>] SyS_epoll_wait+0xd5/0x100
[<ffffffff816533d9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
---------------------------------
[<ffffffff8108cb6d>] hrtimer_nanosleep+0xad/0x170
[<ffffffff8108cc96>] SyS_nanosleep+0x66/0x80
[<ffffffff816533d9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
---------------------------------
[<ffffffff81079271>] do_sigtimedwait+0x161/0x200
[<ffffffff81079386>] SYSC_rt_sigtimedwait+0x76/0xd0
[<ffffffff810793ee>] SyS_rt_sigtimedwait+0xe/0x10
[<ffffffff816533d9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
---------------------------------
[<ffffffff810c277a>] futex_wait_queue_me+0xda/0x140
[<ffffffff810c32be>] futex_wait+0x17e/0x290
[<ffffffff810c4e26>] do_futex+0xe6/0xc30
[<ffffffff810c59e1>] SyS_futex+0x71/0x150
[<ffffffff816533d9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
---------------------------------
[<ffffffff810c277a>] futex_wait_queue_me+0xda/0x140
[<ffffffff810c32be>] futex_wait+0x17e/0x290
[<ffffffff810c4e26>] do_futex+0xe6/0xc30
[<ffffffff810c59e1>] SyS_futex+0x71/0x150
[<ffffffff816533d9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
---------------------------------
[<ffffffff810c277a>] futex_wait_queue_me+0xda/0x140
[<ffffffff810c32be>] futex_wait+0x17e/0x290
[<ffffffff810c4e26>] do_futex+0xe6/0xc30
[<ffffffff810c59e1>] SyS_futex+0x71/0x150
[<ffffffff816533d9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
---------------------------------
[<ffffffffa0426229>] wait_answer_interruptible+0x89/0xd0 [fuse]  <<----------- This is the important thing I think
[<ffffffffa0426612>] __fuse_request_send+0x232/0x290 [fuse]
[<ffffffffa0426682>] fuse_request_send+0x12/0x20 [fuse]
[<ffffffffa042ebea>] fuse_do_open+0xca/0x170 [fuse]
[<ffffffffa042ee06>] fuse_open_common+0x56/0x80 [fuse]
[<ffffffffa042ee40>] fuse_open+0x10/0x20 [fuse]
[<ffffffff811a6e4b>] do_dentry_open+0x1eb/0x280
[<ffffffff811a6f11>] finish_open+0x31/0x40
[<ffffffff811b77ba>] do_last+0x4ca/0xe00
[<ffffffff811b8510>] path_openat+0x420/0x690
[<ffffffff811b8e4a>] do_filp_open+0x3a/0x90
[<ffffffff811a82ee>] do_sys_open+0x12e/0x210
[<ffffffff811a83ee>] SyS_open+0x1e/0x20
[<ffffffff816533d9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
---------------------------------
[<ffffffff810c277a>] futex_wait_queue_me+0xda/0x140
[<ffffffff810c32be>] futex_wait+0x17e/0x290
[<ffffffff810c4e26>] do_futex+0xe6/0xc30
[<ffffffff810c59e1>] SyS_futex+0x71/0x150
[<ffffffff816533d9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
---------------------------------
[<ffffffff810c277a>] futex_wait_queue_me+0xda/0x140
[<ffffffff810c32be>] futex_wait+0x17e/0x290
[<ffffffff810c4e26>] do_futex+0xe6/0xc30
[<ffffffff810c59e1>] SyS_futex+0x71/0x150
[<ffffffff816533d9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
---------------------------------
[<ffffffff8108cb6d>] hrtimer_nanosleep+0xad/0x170
[<ffffffff8108cc96>] SyS_nanosleep+0x66/0x80
[<ffffffff816533d9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
---------------------------------

I don't know how to debug further but it seems like the system call hung

CC Brian Foster.

Pranith
> 
> Thanks for the report!
> Edward.
> 


More information about the Gluster-devel mailing list