[Bugs] [Bug 1175711] New: os.walk() vs scandir.walk() performance
bugzilla at redhat.com
bugzilla at redhat.com
Thu Dec 18 12:49:33 UTC 2014
https://bugzilla.redhat.com/show_bug.cgi?id=1175711
Bug ID: 1175711
Summary: os.walk() vs scandir.walk() performance
Product: GlusterFS
Version: mainline
Component: core
Assignee: bugs at gluster.org
Reporter: ppai at redhat.com
CC: bugs at gluster.org, gluster-bugs at redhat.com
Description of problem:
os.walk() in Python walks the entire path given to it. It internally does a
stat to determine if a file is a file or directory. This additional stat is not
required to determine of a file is a file/directory. An alternative
implementation called "scandir.walk()" exists which is at least 2-3 times
faster. This is because "scandir.walk()" reads the d_type member of dirent
structure returned by readdir(). GlusterFS posix xlator does properly populate
the d_type member. Hence it can be accessed/consumed by applications.
https://github.com/benhoyt/scandir
Version-Release number of selected component (if applicable):
GlusterFS master branch
How reproducible:
Run the benchmark script on glusterfs mount point vs on a xfs mountpoint.
https://github.com/benhoyt/scandir/blob/master/benchmark.py
Actual results:
On XFS:
# python benchmark.py
Using fast C version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.035s, scandir.walk took 0.019s -- 1.9x as fast
On GlusterFS:
# python benchmark.py
Using fast C version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.845s, scandir.walk took 0.864s -- 1.0x as fast
Expected results:
scandir.walk() to be faster than os.walk() as it only does readdir() without
doing stat() on each file.
TODO:
Retry with all performance xlators disabled.
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list