[Bugs] [Bug 1332396] New: posix: Set correct d_type for readdirp() calls

bugzilla at redhat.com bugzilla at redhat.com
Tue May 3 06:29:20 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1332396

            Bug ID: 1332396
           Summary: posix: Set correct d_type for readdirp() calls
           Product: GlusterFS
           Version: 3.8.0
         Component: posix
          Keywords: Performance, Triaged
          Severity: low
          Priority: low
          Assignee: bugs at gluster.org
          Reporter: ppai at redhat.com
                CC: bugs at gluster.org, ndevos at redhat.com, thiago at redhat.com
        Depends On: 1175711



+++ This bug was initially created as a clone of Bug #1175711 +++

Description of problem:
os.walk() in Python walks the entire path given to it. It internally does a
stat to determine if a file is a file or directory. This additional stat is not
required to determine of a file is a file/directory. An alternative
implementation called "scandir.walk()" exists which is at least 2-3 times
faster. This is because "scandir.walk()" reads the d_type member of dirent
structure returned by readdir(). GlusterFS posix xlator does properly populate
the d_type member. Hence it can be accessed/consumed by applications.

https://github.com/benhoyt/scandir

Version-Release number of selected component (if applicable):
GlusterFS master branch

How reproducible:
Run the benchmark script on glusterfs mount point vs on a xfs mountpoint.
https://github.com/benhoyt/scandir/blob/master/benchmark.py


Actual results:
On XFS:
# python benchmark.py 
Using fast C version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.035s, scandir.walk took 0.019s -- 1.9x as fast

On GlusterFS:
# python benchmark.py 
Using fast C version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.845s, scandir.walk took 0.864s -- 1.0x as fast


Expected results:
scandir.walk() to be faster than os.walk() as it only does readdir() without
doing stat() on each file.

TODO:
Retry with all performance xlators disabled.

--- Additional comment from Niels de Vos on 2014-12-23 07:38:50 EST ---

We need to verify if the 'struct dirent'->d_type is retrieved correctly over
the fuse filesystem. In case it is not, this would be a bug in fuse.

--- Additional comment from Prashanth Pai on 2014-12-23 09:54:04 EST ---

I did check that (using following script) on latest master branch code. It does
fill that.

#!/usr/bin/env python

# Return status indicates if d_type returned

import ctypes
import sys

(DT_UNKNOWN, DT_DIR,) = (0, 4,)

class dirent(ctypes.Structure):
  _fields_ = [
    ("d_ino", ctypes.c_long),
    ("d_off", ctypes.c_long),
    ("d_reclen", ctypes.c_ushort),
    ("d_type", ctypes.c_ubyte),
    ("d_name", ctypes.c_char*256)]

direntp = ctypes.POINTER(dirent)

libc = ctypes.cdll.LoadLibrary("libc.so.6")
libc.readdir.restype = direntp

dirp = libc.opendir(".")
if dirp:
  ep = libc.readdir(dirp)
else:
  sys.exit(1)

print ep.contents.d_type

--- Additional comment from Prashanth Pai on 2016-04-27 07:28:51 EDT ---

I was wrong. The above script failed to detect it because d_type is always set
correctly for "." and ".." entries. GlusterFS correctly propagates d_type from
posix xlator up the stack till FUSE.

It turns out that XFS does't fill correct d_type until recently (Linux>=3.15
and xfsprogs>=3.2.0). If one formats his/her filesystem with XFS's newer
version 5 on-disk format, d_type is rightly set.

Example: mkfs.xfs -m crc=1 /srv/disk1

However, GlusterFS can support filling the right d_type in readdirp() responses
even if XFS doesn't using the pre-fetched stat information.

--- Additional comment from Vijay Bellur on 2016-04-27 10:14:05 EDT ---

REVIEW: http://review.gluster.org/14095 (posix: Set correct d_type for
readdirp() calls) posted (#1) for review on master by Prashanth Pai
(ppai at redhat.com)

--- Additional comment from Prashanth Pai on 2016-04-27 10:32:35 EDT ---

Created a nested fs tree of depth = 4 on glusterfs mountpoint.

In the below example: ls command from coreutils is capable of avoiding
additional lstat() if it finds d_type to be set correctly.

BEFORE http://review.gluster.org/14095:

root# strace -fc -e getdents,lstat ls -fR /mnt/gluster-object/gsmetadata >>
/dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 55.95    0.672307          30     22226           getdents
 44.05    0.529388          24     22224           lstat
------ ----------- ----------- --------- --------- ----------------
100.00    1.201695                 44450           total


AFTER http://review.gluster.org/14095:
root# strace -fc -e getdents,lstat ls -fR /mnt/gluster-object/gsmetadata >>
/dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.595680          27     22226           getdents
------ ----------- ----------- --------- --------- ----------------
100.00    0.595680                 22226           total

--- Additional comment from Vijay Bellur on 2016-05-02 01:32:25 EDT ---

REVIEW: http://review.gluster.org/14095 (posix: Set correct d_type for
readdirp() calls) posted (#2) for review on master by Prashanth Pai
(ppai at redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-02 07:48:49 EDT ---

COMMIT: http://review.gluster.org/14095 committed in master by Jeff Darcy
(jdarcy at redhat.com) 
------
commit 77def44d497d090ef3f393b6d9403c1a29dcf993
Author: Prashanth Pai <ppai at redhat.com>
Date:   Wed Apr 27 13:37:07 2016 +0530

    posix: Set correct d_type for readdirp() calls

    dirent.d_type can contain the type of the directory entry. The 'd_type'
    struct member in dirent is present in Linux and many BSD flavours.
    However, filling d_type with correct value requires support from the
    underlying filesystem. If not, d_type is set to DT_UNKNOWN. XFS added
    support for d_type as part of their newer version 5 on-disk format.
    However, this requires Linux >= 3.15, xfsprogs >= 3.2.0 and the bricks
    to be formatted using the new format.

    This patch enables posix xlator to set d_type to the right value even
    when the underlying filesystem does not support it. d_type can be set
    using information previously fetched by stat() on the dir entry.
    This will aid FUSE applications to leverage d_type to avoid the expense
    of calling lstat() if further actions depend on the type of the file.

    Refer `man 3 readdir` and `man 2 getdents`

    BUG: 1175711
    Change-Id: Ic5a262fe4c64122726b4fae2d1bea375c559ca04
    Signed-off-by: Prashanth Pai <ppai at redhat.com>
    Reviewed-on: http://review.gluster.org/14095
    Smoke: Gluster Build System <jenkins at build.gluster.com>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Jeff Darcy <jdarcy at redhat.com>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1175711
[Bug 1175711] posix: Set correct d_type for readdirp() calls
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list