[Gluster-users] Very slow Samba Directory Listing when many files or sub-directories.

Vivek Agarwal vagarwal at redhat.com
Wed Feb 26 13:46:30 UTC 2014


On 02/26/2014 01:09 AM, Jeff Byers wrote:
>
> Hello,
>
> I have a problem with very slow Windows Explorer browsing
> when there are a large number of directories/files.
>
> In this case, the top level folder has almost 6000 directories,
> admittedly large, but it works almost instantaneously when a
> Windows Server share was being used.
> Migrating to a Samba/GlusterFS share, there is almost a 20
> second delay while the explorer window populates the list.
> This leaves a bad impression on the storage performance. The
> systems are otherwise idle.
> To isolate the cause, I've eliminated everything, from
> networking, Windows, and have narrowed in on GlusterFS
> being the sole cause of most of the directory lag.
> I was optimistic on using the GlusterFS VFS libgfapi instead
> of FUSE with Samba, and it does help performance
> dramatically in some cases, but it does not help (and
> sometimes hurts) when compared to the CIFS FUSE mount
> for directory listings.
>
> NFS for directory listings, and small I/O's seems to be
> better, but I cannot use NFS, as I need to use CIFS for
> Windows clients, need ACL's, Active Directory, etc.
>
> Versions:
>     CentOS release 6.5 (Final)
>     # glusterd -V
>     glusterfs 3.4.2 built on Jan  6 2014 14:31:51
>     # smbd -V
>     Version 4.1.4
>
> For testing, I've got a single GlusterFS volume, with a
> single ext4 brick, being accessed locally:
>
> # gluster volume info nas-cbs-0005
> Volume Name: nas-cbs-0005
> Type: Distribute
> Volume ID: 5068e9a5-d60f-439c-b319-befbf9a73a50
> Status: Started
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.5.181:/exports/nas-segment-0004/nas-cbs-0005
> Options Reconfigured:
> server.allow-insecure: on
> nfs.rpc-auth-allow: *
> nfs.disable: off
> nfs.addr-namelookup: off
>
> The Samba share options are:
>
> [nas-cbs-0005]
>     path = /samba/nas-cbs-0005/cifs_share
>     admin users = "localadmin"
>     valid users = "localadmin"
>     invalid users =
>     read list =
>     write list = "localadmin"
>     guest ok = yes
>     read only = no
>     hide unreadable = yes
>     hide dot files = yes
>     available = yes
>
> [nas-cbs-0005-vfs]
>     path = /
>     vfs objects = glusterfs
>     glusterfs:volume = nas-cbs-0005
>     kernel share modes = No
>     use sendfile = false
>     admin users = "localadmin"
>     valid users = "localadmin"
>     invalid users =
>     read list =
>     write list = "localadmin"
>     guest ok = yes
>     read only = no
>     hide unreadable = yes
>     hide dot files = yes
>     available = yes
>
> I've locally mounted the volume three ways, with NFS, Samba
> CIFS through a GlusterFS FUSE mount, and VFS libgfapi mount:
>
> # mount
> /dev/sdr on /exports/nas-segment-0004 type ext4 
> (rw,noatime,auto_da_alloc,barrier,nodelalloc,journal_checksum,acl,user_xattr)
> /var/lib/glusterd/vols/nas-cbs-0005/nas-cbs-0005-fuse.vol on 
> /samba/nas-cbs-0005 type fuse.glusterfs (rw,allow_other,max_read=131072)
> //10.10.200.181/nas-cbs-0005 <http://10.10.200.181/nas-cbs-0005> on 
> /mnt/nas-cbs-0005-cifs type cifs 
> (rw,username=localadmin,password=localadmin)
> 10.10.200.181:/nas-cbs-0005 on /mnt/nas-cbs-0005 type nfs 
> (rw,addr=10.10.200.181)
> //10.10.200.181/nas-cbs-0005-vfs 
> <http://10.10.200.181/nas-cbs-0005-vfs> on /mnt/nas-cbs-0005-cifs-vfs 
> type cifs (rw,username=localadmin,password=localadmin)
>
> Directory listing 6000 empty directories benchmark results:
>
>     Directory listing the ext4 mount directly is almost
>     instantaneous of course.
>
>     Directory listing the NFS mount is also very fast, less than a second.
>
>     Directory listing the CIFS FUSE mount is so slow, almost 16
>     seconds!
>
>     Directory listing the CIFS VFS libgfapi mount is about twice
>     as fast as FUSE, but still slow at 8 seconds.
>
> Unfortunately, due to:
>
>     Bug 1004327 - New files are not inheriting ACL from parent
>                   directory unless "stat-prefetch" is off for
>                   the respective gluster volume
> https://bugzilla.redhat.com/show_bug.cgi?id=1004327
>
> I need to have 'stat-prefetch' off. Retesting with this
> setting.
>
> Directory listing 6000 empty directories benchmark results
> ('stat-prefetch' is off):
>
>     Accessing the ext4 mount directly is almost
>     instantaneous of course.
>
>     Accessing the NFS mount is still very fast, less than a second.
>
>     Accessing the CIFS FUSE mount is slow, almost 14
>     seconds, but slightly faster than when 'stat-prefetch' was
>     on?
>
>     Accessing the CIFS VFS libgfapi mount is now about twice
>     as slow as FUSE, at almost 26 seconds, I guess due
>     to 'stat- prefetch' being off!
>
> To see if the directory listing problem was due to file
> system metadata handling, or small I/O's, did some simple
> small block file I/O benchmarks with the same configuration.
>
>     64KB Sequential Writes:
>
>     NFS small block writes seem slow at about 50 MB/sec.
>
>     CIFS FUSE small block writes are more than twice as fast as
>     NFS, at about 118 MB/sec.
>
>     CIFS VFS libgfapi small block writes are very fast, about
>     twice as fast as CIFS FUSE, at about 232 MB/sec.
>
>     64KB Sequential Reads:
>
>     NFS small block reads are very fast, at about 334 MB/sec.
>
>     CIFS FUSE small block reads are half of NFS, at about 124
>     MB/sec.
>
>     CIFS VFS libgfapi small block reads are about the same as
>     CIFS FUSE, at about 127 MB/sec.
>
>     4KB Sequential Writes:
>
>     NFS very small block writes are very slow at about 4 MB/sec.
>
>     CIFS FUSE very small block writes are faster, at about 11
>     MB/sec.
>
>     CIFS VFS libgfapi very small block writes are twice as fast
>     as CIFS FUSE, at about 22 MB/sec.
>
>     4KB Sequential Reads:
>
>     NFS very small block reads are very fast at about 346
>     MB/sec.
>
>     CIFS FUSE very small block reads are less than half as fast
>     as NFS, at about 143 MB/sec.
>
>     CIFS VFS libgfapi very small block reads a slight bit slower
>     than CIFS FUSE, at about 137 MB/sec.
>
> I'm not quite sure how interpret these results. Write
> caching is playing a part for sure, but it should apply
> equally for both NFS and CIFS I would think. With small file
> I/O's, NFS is better at reading than CIFS, and CIFS VFS is
> twice as good at writing as CIFS FUSE. Sadly, CIFS VFS is
> about the same as CIFS FUSE at reading.
>
> Regarding the directory listing lag problem, I've tried most
> of the the GlusterFS volume options that seemed like they
> might help, but nothing really did.
>
> Gluster having 'stat-prefetch' on helps, but has to be off
> for the bug.
>
> BTW: I've repeated some tests with empty files instead of
> directories, and the results were similar. The issue is not
> specific to directories only.
> I know that small file reads and file-system metadata
> handling is not GlusterFS's strong suit, but is there
> *anything* that can be done to help it out? Any ideas?
> Should I hope/expect for GlusterFS 3.5.x to improve this
> any?
>
> Raw data is below.
>
> Any advice is appreciated. Thanks.
>
> ~ Jeff Byers ~
>
> ##########################
>
> Directory listing of 6000 empty directories ('stat-prefetch'
> is on):
>
> Directory listing the ext4 mount directly is almost
> instantaneous of course.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # time ls -l 
> /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real    0m41.235s (Throw away first time for ext4 FS cache population?)
> # time ls -l 
> /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real    0m0.110s
> # time ls -l 
> /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real    0m0.109s
>
> Directory listing the NFS mount is also very fast.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real    0m44.352s (Throw away first time for ext4 FS cache population?)
> # time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real    0m0.471s
> # time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real    0m0.114s
>
> Directory listing the CIFS FUSE mount is so slow, almost 16
> seconds!
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null
> real    0m56.573s (Throw away first time for ext4 FS cache population?)
> # time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null
> real    0m16.101s
> # time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null
> real    0m15.986s
>
> Directory listing the CIFS VFS libgfapi mount is about twice
> as fast as FUSE, but still slow at 8 seconds.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null
> real    0m48.839s (Throw away first time for ext4 FS cache population?)
> # time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null
> real    0m8.197s
> # time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null
> real    0m8.450s
>
> ####################
>
> Retesting directory list with Gluster default settings,
> including 'stat-prefetch' off, due to:
>
>     Bug 1004327 - New files are not inheriting ACL from parent directory
>                   unless "stat-prefetch" is off for the respective gluster
>                   volume
> https://bugzilla.redhat.com/show_bug.cgi?id=1004327
>
> # gluster volume info nas-cbs-0005
>
> Volume Name: nas-cbs-0005
> Type: Distribute
> Volume ID: 5068e9a5-d60f-439c-b319-befbf9a73a50
> Status: Started
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.5.181:/exports/nas-segment-0004/nas-cbs-0005
> Options Reconfigured:
> performance.stat-prefetch: off
> server.allow-insecure: on
> nfs.rpc-auth-allow: *
> nfs.disable: off
> nfs.addr-namelookup: off
>
> Directory listing of 6000 empty directories ('stat-prefetch'
> is off):
>
> Accessing the ext4 mount directly is almost instantaneous of
> course.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # time ls -l 
> /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real    0m39.483s (Throw away first time for ext4 FS cache population?)
> # time ls -l 
> /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real    0m0.136s
> # time ls -l 
> /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real    0m0.109s
>
> Accessing the NFS mount is also very fast.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real    0m43.819s (Throw away first time for ext4 FS cache population?)
> # time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real    0m0.342s
> # time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real    0m0.200s
>
> Accessing the CIFS FUSE mount is slow, almost 14 seconds!
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null
> real    0m55.759s (Throw away first time for ext4 FS cache population?)
> # time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null
> real    0m13.458s
> # time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null
> real    0m13.665s
>
> Accessing the CIFS VFS libgfapi mount is now about twice as
> slow as FUSE, at almost 26 seconds due to 'stat-prefetch'
> being off!
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null
> real    1m2.821s (Throw away first time for ext4 FS cache population?)
> # time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null
> real    0m25.563s
> # time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null
> real    0m26.949s
>
> ####################
>
> 64KB Writes:
>
> NFS small block writes seem slow at about 50 MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync 
> if=/dev/zero of=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
> time to transfer data was 27.249756 secs, 49.25 MB/sec
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync 
> if=/dev/zero of=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
> time to transfer data was 25.893526 secs, 51.83 MB/sec
>
> CIFS FUSE small block writes are more than twice as fast as NFS, at 
> about 118 MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync 
> if=/dev/zero of=/mnt/nas-cbs-0005-cifs/testfile count=20k
> time to transfer data was 11.509077 secs, 116.62 MB/sec
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync 
> if=/dev/zero of=/mnt/nas-cbs-0005-cifs/testfile count=20k
> time to transfer data was 11.223902 secs, 119.58 MB/sec
>
> CIFS VFS libgfapi small block writes are very fast, about
> twice as fast as CIFS FUSE, at about 232 MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync 
> if=/dev/zero of=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
> time to transfer data was 5.704753 secs, 235.27 MB/sec
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync 
> if=/dev/zero of=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
> time to transfer data was 5.862486 secs, 228.94 MB/sec
>
> 64KB Reads:
>
> NFS small block reads are very fast, at about 334 MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync 
> of=/dev/null if=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
> time to transfer data was 3.972426 secs, 337.87 MB/sec
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync 
> of=/dev/null if=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
> time to transfer data was 4.066978 secs, 330.02 MB/sec
>
> CIFS FUSE small block reads are half of NFS, at about 124
> MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync 
> of=/dev/null if=/mnt/nas-cbs-0005-cifs/testfile count=20k
> time to transfer data was 10.837072 secs, 123.85 MB/sec
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync 
> of=/dev/null if=/mnt/nas-cbs-0005-cifs/testfile count=20k
> time to transfer data was 10.716980 secs, 125.24 MB/sec
>
> CIFS VFS libgfapi small block reads are about the same as
> CIFS FUSE, at about 127 MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync 
> of=/dev/null if=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
> time to transfer data was 10.397888 secs, 129.08 MB/sec
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync 
> of=/dev/null if=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
> time to transfer data was 10.696802 secs, 125.47 MB/sec
>
> 4KB Writes:
>
> NFS very small block writes are very slow at about 4 MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero 
> of=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
> time to transfer data was 20.450521 secs, 4.10 MB/sec
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero 
> of=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
> time to transfer data was 19.669923 secs, 4.26 MB/sec
>
> CIFS FUSE very small block writes are faster, at about 11
> MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero 
> of=/mnt/nas-cbs-0005-cifs/testfile count=20k
> time to transfer data was 7.247578 secs, 11.57 MB/sec
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero 
> of=/mnt/nas-cbs-0005-cifs/testfile count=20k
> time to transfer data was 7.422002 secs, 11.30 MB/sec
>
> CIFS VFS libgfapi very small block writes are twice as fast
> as CIFS FUSE, at about 22 MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero 
> of=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
> time to transfer data was 3.766179 secs, 22.27 MB/sec
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero 
> of=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
> time to transfer data was 3.761176 secs, 22.30 MB/sec
>
> 4KB Reads:
>
> NFS very small block reads are very fast at about 346
> MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null 
> if=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
> time to transfer data was 0.244960 secs, 342.45 MB/sec
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null 
> if=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
> time to transfer data was 0.240472 secs, 348.84 MB/sec
>
> CIFS FUSE very small block reads are less than half as fast
> as NFS, at about 143 MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null 
> if=/mnt/nas-cbs-0005-cifs/testfile count=20k
> time to transfer data was 0.606534 secs, 138.30 MB/sec
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null 
> if=/mnt/nas-cbs-0005-cifs/testfile count=20k
> time to transfer data was 0.576185 secs, 145.59 MB/sec
>
> CIFS VFS libgfapi very small block reads a slight bit slower
> than CIFS FUSE, at about 137 MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null 
> if=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
> time to transfer data was 0.611328 secs, 137.22 MB/sec
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null 
> if=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
> time to transfer data was 0.615834 secs, 136.22 MB/sec
>
> EOM
Hi Jeff,

Can you open a bugzilla for the same upstream and put all the relevant 
information in to that? That will help us in having a single place to 
track and solve this issue.

Regards,
Vivek
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140226/a422ef9e/attachment.html>


More information about the Gluster-users mailing list