[Gluster-users] Very slow Samba Directory Listing when many files or sub-directories.
Vivek Agarwal
vagarwal at redhat.com
Wed Feb 26 13:46:30 UTC 2014
On 02/26/2014 01:09 AM, Jeff Byers wrote:
>
> Hello,
>
> I have a problem with very slow Windows Explorer browsing
> when there are a large number of directories/files.
>
> In this case, the top level folder has almost 6000 directories,
> admittedly large, but it works almost instantaneously when a
> Windows Server share was being used.
> Migrating to a Samba/GlusterFS share, there is almost a 20
> second delay while the explorer window populates the list.
> This leaves a bad impression on the storage performance. The
> systems are otherwise idle.
> To isolate the cause, I've eliminated everything, from
> networking, Windows, and have narrowed in on GlusterFS
> being the sole cause of most of the directory lag.
> I was optimistic on using the GlusterFS VFS libgfapi instead
> of FUSE with Samba, and it does help performance
> dramatically in some cases, but it does not help (and
> sometimes hurts) when compared to the CIFS FUSE mount
> for directory listings.
>
> NFS for directory listings, and small I/O's seems to be
> better, but I cannot use NFS, as I need to use CIFS for
> Windows clients, need ACL's, Active Directory, etc.
>
> Versions:
> CentOS release 6.5 (Final)
> # glusterd -V
> glusterfs 3.4.2 built on Jan 6 2014 14:31:51
> # smbd -V
> Version 4.1.4
>
> For testing, I've got a single GlusterFS volume, with a
> single ext4 brick, being accessed locally:
>
> # gluster volume info nas-cbs-0005
> Volume Name: nas-cbs-0005
> Type: Distribute
> Volume ID: 5068e9a5-d60f-439c-b319-befbf9a73a50
> Status: Started
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.5.181:/exports/nas-segment-0004/nas-cbs-0005
> Options Reconfigured:
> server.allow-insecure: on
> nfs.rpc-auth-allow: *
> nfs.disable: off
> nfs.addr-namelookup: off
>
> The Samba share options are:
>
> [nas-cbs-0005]
> path = /samba/nas-cbs-0005/cifs_share
> admin users = "localadmin"
> valid users = "localadmin"
> invalid users =
> read list =
> write list = "localadmin"
> guest ok = yes
> read only = no
> hide unreadable = yes
> hide dot files = yes
> available = yes
>
> [nas-cbs-0005-vfs]
> path = /
> vfs objects = glusterfs
> glusterfs:volume = nas-cbs-0005
> kernel share modes = No
> use sendfile = false
> admin users = "localadmin"
> valid users = "localadmin"
> invalid users =
> read list =
> write list = "localadmin"
> guest ok = yes
> read only = no
> hide unreadable = yes
> hide dot files = yes
> available = yes
>
> I've locally mounted the volume three ways, with NFS, Samba
> CIFS through a GlusterFS FUSE mount, and VFS libgfapi mount:
>
> # mount
> /dev/sdr on /exports/nas-segment-0004 type ext4
> (rw,noatime,auto_da_alloc,barrier,nodelalloc,journal_checksum,acl,user_xattr)
> /var/lib/glusterd/vols/nas-cbs-0005/nas-cbs-0005-fuse.vol on
> /samba/nas-cbs-0005 type fuse.glusterfs (rw,allow_other,max_read=131072)
> //10.10.200.181/nas-cbs-0005 <http://10.10.200.181/nas-cbs-0005> on
> /mnt/nas-cbs-0005-cifs type cifs
> (rw,username=localadmin,password=localadmin)
> 10.10.200.181:/nas-cbs-0005 on /mnt/nas-cbs-0005 type nfs
> (rw,addr=10.10.200.181)
> //10.10.200.181/nas-cbs-0005-vfs
> <http://10.10.200.181/nas-cbs-0005-vfs> on /mnt/nas-cbs-0005-cifs-vfs
> type cifs (rw,username=localadmin,password=localadmin)
>
> Directory listing 6000 empty directories benchmark results:
>
> Directory listing the ext4 mount directly is almost
> instantaneous of course.
>
> Directory listing the NFS mount is also very fast, less than a second.
>
> Directory listing the CIFS FUSE mount is so slow, almost 16
> seconds!
>
> Directory listing the CIFS VFS libgfapi mount is about twice
> as fast as FUSE, but still slow at 8 seconds.
>
> Unfortunately, due to:
>
> Bug 1004327 - New files are not inheriting ACL from parent
> directory unless "stat-prefetch" is off for
> the respective gluster volume
> https://bugzilla.redhat.com/show_bug.cgi?id=1004327
>
> I need to have 'stat-prefetch' off. Retesting with this
> setting.
>
> Directory listing 6000 empty directories benchmark results
> ('stat-prefetch' is off):
>
> Accessing the ext4 mount directly is almost
> instantaneous of course.
>
> Accessing the NFS mount is still very fast, less than a second.
>
> Accessing the CIFS FUSE mount is slow, almost 14
> seconds, but slightly faster than when 'stat-prefetch' was
> on?
>
> Accessing the CIFS VFS libgfapi mount is now about twice
> as slow as FUSE, at almost 26 seconds, I guess due
> to 'stat- prefetch' being off!
>
> To see if the directory listing problem was due to file
> system metadata handling, or small I/O's, did some simple
> small block file I/O benchmarks with the same configuration.
>
> 64KB Sequential Writes:
>
> NFS small block writes seem slow at about 50 MB/sec.
>
> CIFS FUSE small block writes are more than twice as fast as
> NFS, at about 118 MB/sec.
>
> CIFS VFS libgfapi small block writes are very fast, about
> twice as fast as CIFS FUSE, at about 232 MB/sec.
>
> 64KB Sequential Reads:
>
> NFS small block reads are very fast, at about 334 MB/sec.
>
> CIFS FUSE small block reads are half of NFS, at about 124
> MB/sec.
>
> CIFS VFS libgfapi small block reads are about the same as
> CIFS FUSE, at about 127 MB/sec.
>
> 4KB Sequential Writes:
>
> NFS very small block writes are very slow at about 4 MB/sec.
>
> CIFS FUSE very small block writes are faster, at about 11
> MB/sec.
>
> CIFS VFS libgfapi very small block writes are twice as fast
> as CIFS FUSE, at about 22 MB/sec.
>
> 4KB Sequential Reads:
>
> NFS very small block reads are very fast at about 346
> MB/sec.
>
> CIFS FUSE very small block reads are less than half as fast
> as NFS, at about 143 MB/sec.
>
> CIFS VFS libgfapi very small block reads a slight bit slower
> than CIFS FUSE, at about 137 MB/sec.
>
> I'm not quite sure how interpret these results. Write
> caching is playing a part for sure, but it should apply
> equally for both NFS and CIFS I would think. With small file
> I/O's, NFS is better at reading than CIFS, and CIFS VFS is
> twice as good at writing as CIFS FUSE. Sadly, CIFS VFS is
> about the same as CIFS FUSE at reading.
>
> Regarding the directory listing lag problem, I've tried most
> of the the GlusterFS volume options that seemed like they
> might help, but nothing really did.
>
> Gluster having 'stat-prefetch' on helps, but has to be off
> for the bug.
>
> BTW: I've repeated some tests with empty files instead of
> directories, and the results were similar. The issue is not
> specific to directories only.
> I know that small file reads and file-system metadata
> handling is not GlusterFS's strong suit, but is there
> *anything* that can be done to help it out? Any ideas?
> Should I hope/expect for GlusterFS 3.5.x to improve this
> any?
>
> Raw data is below.
>
> Any advice is appreciated. Thanks.
>
> ~ Jeff Byers ~
>
> ##########################
>
> Directory listing of 6000 empty directories ('stat-prefetch'
> is on):
>
> Directory listing the ext4 mount directly is almost
> instantaneous of course.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # time ls -l
> /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real 0m41.235s (Throw away first time for ext4 FS cache population?)
> # time ls -l
> /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real 0m0.110s
> # time ls -l
> /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real 0m0.109s
>
> Directory listing the NFS mount is also very fast.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real 0m44.352s (Throw away first time for ext4 FS cache population?)
> # time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real 0m0.471s
> # time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real 0m0.114s
>
> Directory listing the CIFS FUSE mount is so slow, almost 16
> seconds!
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null
> real 0m56.573s (Throw away first time for ext4 FS cache population?)
> # time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null
> real 0m16.101s
> # time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null
> real 0m15.986s
>
> Directory listing the CIFS VFS libgfapi mount is about twice
> as fast as FUSE, but still slow at 8 seconds.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null
> real 0m48.839s (Throw away first time for ext4 FS cache population?)
> # time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null
> real 0m8.197s
> # time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null
> real 0m8.450s
>
> ####################
>
> Retesting directory list with Gluster default settings,
> including 'stat-prefetch' off, due to:
>
> Bug 1004327 - New files are not inheriting ACL from parent directory
> unless "stat-prefetch" is off for the respective gluster
> volume
> https://bugzilla.redhat.com/show_bug.cgi?id=1004327
>
> # gluster volume info nas-cbs-0005
>
> Volume Name: nas-cbs-0005
> Type: Distribute
> Volume ID: 5068e9a5-d60f-439c-b319-befbf9a73a50
> Status: Started
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.5.181:/exports/nas-segment-0004/nas-cbs-0005
> Options Reconfigured:
> performance.stat-prefetch: off
> server.allow-insecure: on
> nfs.rpc-auth-allow: *
> nfs.disable: off
> nfs.addr-namelookup: off
>
> Directory listing of 6000 empty directories ('stat-prefetch'
> is off):
>
> Accessing the ext4 mount directly is almost instantaneous of
> course.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # time ls -l
> /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real 0m39.483s (Throw away first time for ext4 FS cache population?)
> # time ls -l
> /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real 0m0.136s
> # time ls -l
> /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real 0m0.109s
>
> Accessing the NFS mount is also very fast.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real 0m43.819s (Throw away first time for ext4 FS cache population?)
> # time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real 0m0.342s
> # time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
> real 0m0.200s
>
> Accessing the CIFS FUSE mount is slow, almost 14 seconds!
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null
> real 0m55.759s (Throw away first time for ext4 FS cache population?)
> # time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null
> real 0m13.458s
> # time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null
> real 0m13.665s
>
> Accessing the CIFS VFS libgfapi mount is now about twice as
> slow as FUSE, at almost 26 seconds due to 'stat-prefetch'
> being off!
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null
> real 1m2.821s (Throw away first time for ext4 FS cache population?)
> # time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null
> real 0m25.563s
> # time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null
> real 0m26.949s
>
> ####################
>
> 64KB Writes:
>
> NFS small block writes seem slow at about 50 MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync
> if=/dev/zero of=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
> time to transfer data was 27.249756 secs, 49.25 MB/sec
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync
> if=/dev/zero of=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
> time to transfer data was 25.893526 secs, 51.83 MB/sec
>
> CIFS FUSE small block writes are more than twice as fast as NFS, at
> about 118 MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync
> if=/dev/zero of=/mnt/nas-cbs-0005-cifs/testfile count=20k
> time to transfer data was 11.509077 secs, 116.62 MB/sec
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync
> if=/dev/zero of=/mnt/nas-cbs-0005-cifs/testfile count=20k
> time to transfer data was 11.223902 secs, 119.58 MB/sec
>
> CIFS VFS libgfapi small block writes are very fast, about
> twice as fast as CIFS FUSE, at about 232 MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync
> if=/dev/zero of=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
> time to transfer data was 5.704753 secs, 235.27 MB/sec
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync
> if=/dev/zero of=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
> time to transfer data was 5.862486 secs, 228.94 MB/sec
>
> 64KB Reads:
>
> NFS small block reads are very fast, at about 334 MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync
> of=/dev/null if=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
> time to transfer data was 3.972426 secs, 337.87 MB/sec
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync
> of=/dev/null if=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
> time to transfer data was 4.066978 secs, 330.02 MB/sec
>
> CIFS FUSE small block reads are half of NFS, at about 124
> MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync
> of=/dev/null if=/mnt/nas-cbs-0005-cifs/testfile count=20k
> time to transfer data was 10.837072 secs, 123.85 MB/sec
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync
> of=/dev/null if=/mnt/nas-cbs-0005-cifs/testfile count=20k
> time to transfer data was 10.716980 secs, 125.24 MB/sec
>
> CIFS VFS libgfapi small block reads are about the same as
> CIFS FUSE, at about 127 MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync
> of=/dev/null if=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
> time to transfer data was 10.397888 secs, 129.08 MB/sec
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync
> of=/dev/null if=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
> time to transfer data was 10.696802 secs, 125.47 MB/sec
>
> 4KB Writes:
>
> NFS very small block writes are very slow at about 4 MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero
> of=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
> time to transfer data was 20.450521 secs, 4.10 MB/sec
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero
> of=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
> time to transfer data was 19.669923 secs, 4.26 MB/sec
>
> CIFS FUSE very small block writes are faster, at about 11
> MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero
> of=/mnt/nas-cbs-0005-cifs/testfile count=20k
> time to transfer data was 7.247578 secs, 11.57 MB/sec
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero
> of=/mnt/nas-cbs-0005-cifs/testfile count=20k
> time to transfer data was 7.422002 secs, 11.30 MB/sec
>
> CIFS VFS libgfapi very small block writes are twice as fast
> as CIFS FUSE, at about 22 MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero
> of=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
> time to transfer data was 3.766179 secs, 22.27 MB/sec
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero
> of=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
> time to transfer data was 3.761176 secs, 22.30 MB/sec
>
> 4KB Reads:
>
> NFS very small block reads are very fast at about 346
> MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null
> if=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
> time to transfer data was 0.244960 secs, 342.45 MB/sec
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null
> if=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
> time to transfer data was 0.240472 secs, 348.84 MB/sec
>
> CIFS FUSE very small block reads are less than half as fast
> as NFS, at about 143 MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null
> if=/mnt/nas-cbs-0005-cifs/testfile count=20k
> time to transfer data was 0.606534 secs, 138.30 MB/sec
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null
> if=/mnt/nas-cbs-0005-cifs/testfile count=20k
> time to transfer data was 0.576185 secs, 145.59 MB/sec
>
> CIFS VFS libgfapi very small block reads a slight bit slower
> than CIFS FUSE, at about 137 MB/sec.
>
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null
> if=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
> time to transfer data was 0.611328 secs, 137.22 MB/sec
> # sync;sync; echo '3' > /proc/sys/vm/drop_caches
> # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null
> if=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
> time to transfer data was 0.615834 secs, 136.22 MB/sec
>
> EOM
Hi Jeff,
Can you open a bugzilla for the same upstream and put all the relevant
information in to that? That will help us in having a single place to
track and solve this issue.
Regards,
Vivek
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140226/a422ef9e/attachment.html>
More information about the Gluster-users
mailing list