[Bugs] [Bug 1175711] New: os.walk() vs scandir.walk() performance

bugzilla at redhat.com bugzilla at redhat.com
Thu Dec 18 12:49:33 UTC 2014


https://bugzilla.redhat.com/show_bug.cgi?id=1175711

            Bug ID: 1175711
           Summary: os.walk() vs scandir.walk() performance
           Product: GlusterFS
           Version: mainline
         Component: core
          Assignee: bugs at gluster.org
          Reporter: ppai at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com



Description of problem:
os.walk() in Python walks the entire path given to it. It internally does a
stat to determine if a file is a file or directory. This additional stat is not
required to determine of a file is a file/directory. An alternative
implementation called "scandir.walk()" exists which is at least 2-3 times
faster. This is because "scandir.walk()" reads the d_type member of dirent
structure returned by readdir(). GlusterFS posix xlator does properly populate
the d_type member. Hence it can be accessed/consumed by applications.

https://github.com/benhoyt/scandir

Version-Release number of selected component (if applicable):
GlusterFS master branch

How reproducible:
Run the benchmark script on glusterfs mount point vs on a xfs mountpoint.
https://github.com/benhoyt/scandir/blob/master/benchmark.py


Actual results:
On XFS:
# python benchmark.py 
Using fast C version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.035s, scandir.walk took 0.019s -- 1.9x as fast

On GlusterFS:
# python benchmark.py 
Using fast C version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.845s, scandir.walk took 0.864s -- 1.0x as fast


Expected results:
scandir.walk() to be faster than os.walk() as it only does readdir() without
doing stat() on each file.

TODO:
Retry with all performance xlators disabled.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list