[Bugs] [Bug 1361519] New: [Disperse] dd + rm + ls lead to IO hang

bugzilla at redhat.com bugzilla at redhat.com
Fri Jul 29 09:22:07 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1361519

            Bug ID: 1361519
           Summary: [Disperse] dd + rm + ls lead to IO hang
           Product: Red Hat Gluster Storage
           Version: 3.1
         Component: disperse
          Keywords: Triaged
          Severity: high
          Priority: medium
          Assignee: pkarampu at redhat.com
          Reporter: pkarampu at redhat.com
        QA Contact: storage-qa-internal at redhat.com
                CC: aspandey at redhat.com, bugs at gluster.org,
                    hgowtham at redhat.com, pkarampu at redhat.com,
                    ravishankar at redhat.com, rhs-bugs at redhat.com,
                    xhernandez at datalab.es
        Depends On: 1346719



+++ This bug was initially created as a clone of Bug #1346719 +++

Description of problem:

Creation of files and ls gets hanged while trying to do rm -rf in infinite loop

Version-Release number of selected component (if applicable):
[root at apandey gluster]# glusterfs --version
glusterfs 3.9dev built on Jun 15 2016 11:39:11
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.


How reproducible:
1/1

Steps to Reproduce:
1.  Create a disperse volume.
2. Mount this volume on 3 mount points- m1, m2 , m3
3. Create 10000 file on m1 using for and dd. After some time start rm -rf on m2
in an infinite loop. Start ls -lRT on m3 

Actual results:
IO Hang has been seen. on m1, m3. 

Expected results:
There should not be any hang.

Additional info:

Volume Name: vol
Type: Disperse
Volume ID: c81743b4-ab0e-4d9b-931b-4d67f4d24a75
Status: Started
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: apandey:/brick/gluster/vol-1
Brick2: apandey:/brick/gluster/vol-2
Brick3: apandey:/brick/gluster/vol-3
Brick4: apandey:/brick/gluster/vol-4
Brick5: apandey:/brick/gluster/vol-5
Brick6: apandey:/brick/gluster/vol-6
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: off
Status of volume: vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick apandey:/brick/gluster/vol-1          49152     0          Y       13179
Brick apandey:/brick/gluster/vol-2          49153     0          Y       13198
Brick apandey:/brick/gluster/vol-3          49154     0          Y       13217
Brick apandey:/brick/gluster/vol-4          49155     0          Y       13236
Brick apandey:/brick/gluster/vol-5          49156     0          Y       13255
Brick apandey:/brick/gluster/vol-6          49157     0          Y       13274
NFS Server on localhost                     N/A       N/A        N       N/A  
Self-heal Daemon on localhost               N/A       N/A        Y       13302

Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks

[root at apandey gluster]#  mount

usectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
apandey:vol on /mnt/glu type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
apandey:vol on /mnt/gfs type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
apandey:vol on /mnt/vol type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
[root at apandey gluster]#

--- Additional comment from Ashish Pandey on 2016-06-15 05:01:07 EDT ---

statedump shows some blocked inodelk - 

[conn.1.bound_xl./brick/gluster/vol-1.active.1]
gfid=00000000-0000-0000-0000-000000000001
nlookup=3
fd-count=3
ref=1
ia_type=2

[xlator.features.locks.vol-locks.inode]
path=/
mandatory=0
inodelk-count=3
lock-dump.domain.domain=dht.layout.heal
lock-dump.domain.domain=vol-disperse-0:self-heal
lock-dump.domain.domain=vol-disperse-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 3327,
owner=dc710738fd7e0000, client=0x7f283c1a7b00,
connection-id=apandey-15766-2016/06/15-07:59:38:894408-vol-client-0-0-0,
blocked at 2016-06-15 08:02:13, granted at 2016-06-15 08:02:13
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 22451,
owner=cc338ae8f07f0000, client=0x7f2834006660,
connection-id=apandey-13531-2016/06/15-07:58:50:360055-vol-client-0-0-0,
blocked at 2016-06-15 08:02:13
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 22530,
owner=6cd51d48da7f0000, client=0x7f28342db820,
connection-id=apandey-19856-2016/06/15-08:01:05:258794-vol-client-0-0-0,
blocked at 2016-06-15 08:02:22

--- Additional comment from Ashish Pandey on 2016-06-15 05:08:42 EDT ---


Just observed that option disperse.eager-lock has come to rescue- 
Setting disperse.eager-lock to off started IO's and ls -lR command.

gluster v set vol disperse.eager-lock off


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1346719
[Bug 1346719] [Disperse] dd + rm + ls lead to IO hang
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=XXktUj1ZLo&a=cc_unsubscribe


More information about the Bugs mailing list