[Bugs] [Bug 1266836] New: AFR : fuse, nfs mount hangs when directories with same names are created and deleted continuously

Mon Sep 28 07:22:53 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1266836

            Bug ID: 1266836
           Summary: AFR : fuse,nfs mount hangs when directories with same
                    names are created and deleted continuously
           Product: GlusterFS
           Version: 3.7.5
         Component: protocol
          Assignee: bugs at gluster.org
          Reporter: ravishankar at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com
        Depends On: 1266834
            Blocks: 986916

+++ This bug was initially created as a clone of Bug #1266834 +++

spandura at redhat.com 2013-07-22 08:24:21 EDT

Description of problem:
=========================
In a distribute-replicate volume, when directories with same names are created
and deleted continuously on fuse and nfs mount points, after certain time the
mount points hang. 

Refer to bug: 922792

Version-Release number of selected component (if applicable):
==============================================================
root at rhs-client11 [Jul-22-2013-16:00:29] >rpm -qa | grep glusterfs-server
glusterfs-server-3.4.0.12rhs.beta5-2.el6rhs.x86_64

root at rhs-client11 [Jul-22-2013-16:00:40] >gluster --version
glusterfs 3.4.0.12rhs.beta5 built on Jul 18 2013 07:00:39

How reproducible:

test_bug_922792.sh
===================
#!/bin/bash

dir=$(dirname $(readlink -f $0))
echo 'Script in '$dir
while :
do
        mkdir -p foo$1/bar/gee
        mkdir -p foo$1/bar/gne
        mkdir -p foo$1/lna/gme
        rm -rf foo$1
done

Steps to Reproduce:
===================
1. Create a distribute-replicate volume ( 6 x 2 ). 4 storage nodes . 3 bricks
on each storage node. 

2. Start the volume.

3. Create 2 fuse and 2 nfs mounts on each RHEL5.9 and RHEL6.4 clients. 

4. From all the mount points execute: "test_bug_922792.sh" 

Actual results:
===============
After sometime, fuse and nfs mount hangs. 

Expected results:
================
Fuse and nfs mount shouldn't hang.

-------------------------------------------
 Krutika Dhananjay 2014-07-28 04:53:50 EDT

Couple of observations:

Tried this on a 2x2 volume with 2 nfs and 2 fuse mounts and the bug is 100%
reproducible.

The hang stems from one of the clients failing to unlock an inodelk it had held
before, on which the hung client is blocked forever. This can be deduced from
the presence of the following log messages:

[root at calvin glusterfs]# grep -T 'unlock' nfs.log mnt-glusterfs.log
nfs.log:[2014-07-20 04:34:47.247058] I
[afr-lk-common.c:676:afr_unlock_inodelk_cbk] 0-dis-rep-replicate-0:
/foo/lna/gme: unlock failed on subvolume dis-rep-client-0 with lock owner
d8a0227c237f0000. Reason : Stale file handle
nfs.log:[2014-07-20 04:34:47.247390] I
[afr-lk-common.c:676:afr_unlock_inodelk_cbk] 0-dis-rep-replicate-0:
/foo/lna/gme: unlock failed on subvolume dis-rep-client-1 with lock owner
d8a0227c237f0000. Reason : Stale file handle

The ESTALE error on unlock is originating from server resolver, suggestive of
the fact that the inode had been unlinked and is no longer part of the inode
table.

Furthermore, I added GF_ASSERT in afr_unlock_inodelk_cbk() to deliberately
crash the client with SIGABRT on unlock failure.
The core suggests that (local->loc).gfid and (local->loc).inode->gfid are both
different when they should both ideally be one and the same.

All this seems to suggest that this is possibly due to existing races between
DHT's lookup self-heal/rmdir/mkdir codepaths.

--- Additional comment from Ravishankar N on 2015-09-28 03:17:28 EDT ---

review.gluster.org/#/c/12233/

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1266834
[Bug 1266834] AFR : fuse,nfs mount hangs when directories with same names
are created and deleted continuously
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.