[Gluster-devel] umount hangs in NetBSD

Sun Jun 7 17:05:50 UTC 2015

Vijay Bellur <vbellur at redhat.com> wrote:

> I am also not certain why we end up with stale NFS mounts at the first
> place. Any ideas as to why this might be happening?

That happens when you try to umount while the NFS server is down, or in
our case after the NFS server process was shut down. The kernel waits in
unmount(2) for a server reply forever, and anything that tries to access
the filesystem waits for unmount(2) to terminate and have a wchan set to
tstile in ps -axl output.

The first fix is therefore to unmount before calling the cleanup routine
that terminates the glusterfsd that acts as the NFS server, like this:

EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" force_umount $N0
cleanup

But then there is the case where the NFS server died before, which is a
bug we want to fix, but it hangs in the tests and this is not desirable.

umount -f $N0 does not help here since NetBSD umount(8) command does an
unfortunate realpath(3) call that will lock it up on the unresponsive
NFS mount before it actually calls unmount(2).

NetBSD's umount(8) has a -R flag that cause it to skip the realpath(3)
call. Hence umount -f -R $N0 is the way to go to work around that case.
But this -R flag is not portable and should only be used in the NetBSD
case. I will try to craft a change for that tomorrow.

Note that there is a known NetBSD bug: if you do a system call that gets
stuck because of an unresponsive NFS server, a umount -f -R $N0 will not
unlock the situation because it waits for the first process to complete
(tstile state). reboot -n is the only way to recover that. 

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu at netbsd.org