[Bugs] [Bug 1373392] New: [Disperse] dd + rm + ls lead to IO hang

bugzilla at redhat.com bugzilla at redhat.com
Tue Sep 6 07:34:24 UTC 2016


            Bug ID: 1373392
           Summary: [Disperse] dd + rm + ls lead to IO hang
           Product: GlusterFS
           Version: 3.7.15
         Component: disperse
          Keywords: Triaged
          Severity: high
          Priority: medium
          Assignee: bugs at gluster.org
          Reporter: aspandey at redhat.com
                CC: amukherj at redhat.com, bugs at gluster.org,
                    hgowtham at redhat.com, pkarampu at redhat.com,
                    ravishankar at redhat.com, xhernandez at datalab.es
        Depends On: 1346719
            Blocks: 1361519, 1371397, 1362420

+++ This bug was initially created as a clone of Bug #1346719 +++

Description of problem:

Creation of files and ls gets hanged while trying to do rm -rf in infinite loop

Version-Release number of selected component (if applicable):
[root at apandey gluster]# glusterfs --version
glusterfs 3.9dev built on Jun 15 2016 11:39:11
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

How reproducible:

Steps to Reproduce:
1.  Create a disperse volume.
2. Mount this volume on 3 mount points- m1, m2 , m3
3. Create 10000 file on m1 using for and dd. After some time start rm -rf on m2
in an infinite loop. Start ls -lRT on m3 

Actual results:
IO Hang has been seen. on m1, m3. 

Expected results:
There should not be any hang.

Additional info:

Volume Name: vol
Type: Disperse
Volume ID: c81743b4-ab0e-4d9b-931b-4d67f4d24a75
Status: Started
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Brick1: apandey:/brick/gluster/vol-1
Brick2: apandey:/brick/gluster/vol-2
Brick3: apandey:/brick/gluster/vol-3
Brick4: apandey:/brick/gluster/vol-4
Brick5: apandey:/brick/gluster/vol-5
Brick6: apandey:/brick/gluster/vol-6
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: off
Status of volume: vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
Brick apandey:/brick/gluster/vol-1          49152     0          Y       13179
Brick apandey:/brick/gluster/vol-2          49153     0          Y       13198
Brick apandey:/brick/gluster/vol-3          49154     0          Y       13217
Brick apandey:/brick/gluster/vol-4          49155     0          Y       13236
Brick apandey:/brick/gluster/vol-5          49156     0          Y       13255
Brick apandey:/brick/gluster/vol-6          49157     0          Y       13274
NFS Server on localhost                     N/A       N/A        N       N/A  
Self-heal Daemon on localhost               N/A       N/A        Y       13302

Task Status of Volume vol
There are no active volume tasks

[root at apandey gluster]#  mount

usectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
apandey:vol on /mnt/glu type fuse.glusterfs
apandey:vol on /mnt/gfs type fuse.glusterfs
apandey:vol on /mnt/vol type fuse.glusterfs
[root at apandey gluster]#

--- Additional comment from Ashish Pandey on 2016-06-15 05:01:07 EDT ---

statedump shows some blocked inodelk - 


inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 3327,
owner=dc710738fd7e0000, client=0x7f283c1a7b00,
blocked at 2016-06-15 08:02:13, granted at 2016-06-15 08:02:13
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 22451,
owner=cc338ae8f07f0000, client=0x7f2834006660,
blocked at 2016-06-15 08:02:13
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 22530,
owner=6cd51d48da7f0000, client=0x7f28342db820,
blocked at 2016-06-15 08:02:22

--- Additional comment from Ashish Pandey on 2016-06-15 05:08:42 EDT ---

Just observed that option disperse.eager-lock has come to rescue- 
Setting disperse.eager-lock to off started IO's and ls -lR command.

gluster v set vol disperse.eager-lock off

--- Additional comment from Worker Ant on 2016-08-24 11:54:52 EDT ---

REVIEW: http://review.gluster.org/15309 (cluster/ec: Use locks for opendir)
posted (#1) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Worker Ant on 2016-08-25 09:48:36 EDT ---

COMMIT: http://review.gluster.org/15309 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com) 
commit f013335400d033a9677797377b90b968803135f4
Author: Pranith Kumar K <pkarampu at redhat.com>
Date:   Wed Aug 24 21:01:05 2016 +0530

    cluster/ec: Use locks for opendir

    In some cases we see that readdir keeps winding to the brick that doesn't
    any blocked locks i.e. first brick. This is leading to the client assuming
    there are no blocking locks on the inode so it won't give away the lock.
    clients end up blocked on the lock as if the command hung.

    Proper way to fix this issue is to use infra present in
    http://review.gluster.org/14736 This is a stop gap fix where we start
    inodelks in opendir which goes to all the bricks, this will detect if there
    any contention.

    BUG: 1346719
    Change-Id: I91109107a26f6535b945ac476338e9f21dc31eb9
    Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
    Reviewed-on: http://review.gluster.org/15309
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    Reviewed-by: Ashish Pandey <aspandey at redhat.com>

Referenced Bugs:

[Bug 1346719] [Disperse] dd + rm + ls lead to IO hang
[Bug 1361519] [Disperse] dd + rm + ls lead to IO hang
[Bug 1362420] [Disperse] dd + rm + ls lead to IO hang
[Bug 1371397] [Disperse] dd + rm + ls lead to IO hang
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

More information about the Bugs mailing list