[Gluster-users] small files performance

Sun Oct 15 16:23:51 UTC 2017

I get well over 2k IOPs in my OLD 12 disk RAID 6 HW in the lab(4 nodes 2x2 volume):

https://access.redhat.com/sites/default/files/attachments/rhgs3_1_perfbrief_portalv1.pdf

That data is from 3.1, things have improved much since then(I think closer to 3.2k IOPs on the same HW?).  I have a total of 48 disks though(20 data, 8 parity, 20 redundancy) I'm not sure what you have.  I can extract a kernel in between 4 minutes and 1 min 30 secs depending on tunibles and if I use multi threaded TAR tools developed by Ben England.  If you don't have access to the RH paywall you will just have to trust me since the perf brief requires a sub.  The key to getting smallfile perf out of gluster is to use multiple threads and multiple clients.

What is your back end like?

-b

----- Original Message -----
> From: "Gandalf Corvotempesta" <gandalf.corvotempesta at gmail.com>
> To: "Szymon Miotk" <szymon.miotk at gmail.com>, "gluster-users" <Gluster-users at gluster.org>
> Sent: Friday, October 13, 2017 3:56:14 AM
> Subject: Re: [Gluster-users] small files performance
> 
> Where did you read 2k IOPS?
> 
> Each disk is able to do about 75iops as I'm using SATA disk, getting even
> closer to 2000 it's impossible
> 
> Il 13 ott 2017 9:42 AM, "Szymon Miotk" < szymon.miotk at gmail.com > ha scritto:
> 
> 
> Depends what you need.
> 2K iops for small file writes is not a bad result.
> In my case I had a system that was just poorly written and it was
> using 300-1000 iops for constant operations and was choking on
> cleanup.
> 
> 
> On Thu, Oct 12, 2017 at 6:23 PM, Gandalf Corvotempesta
> < gandalf.corvotempesta at gmail.com > wrote:
> > So, even with latest version, gluster is still unusable with small files ?
> > 
> > 2017-10-12 10:51 GMT+02:00 Szymon Miotk < szymon.miotk at gmail.com >:
> >> I've analyzed small files performance few months ago, because I had
> >> huge performance problems with small files writes on Gluster.
> >> The read performance has been improved in many ways in recent releases
> >> (md-cache, parallel-readdir, hot-tier).
> >> But write performance is more or less the same and you cannot go above
> >> 10K smallfiles create - even with SSD or Optane drives.
> >> Even ramdisk is not helping much here, because the bottleneck is not
> >> in the storage performance.
> >> Key problems I've noticed:
> >> - LOOKUPs are expensive, because there is separate query for every
> >> depth level of destination directory (md-cache helps here a bit,
> >> unless you are creating lot of directories). So the deeper the
> >> directory structure, the worse.
> >> - for every file created, Gluster creates another file in .glusterfs
> >> directory, doubling the required IO and network latency. What's worse,
> >> XFS, the recommended filesystem, doesn't like flat directory sturcture
> >> with thousands files in each directory. But that's exactly how Gluster
> >> stores its metadata in .glusterfs, so the performance decreases by
> >> 40-50% after 10M files.
> >> - complete directory structure is created on each of the bricks. So
> >> every mkdir results in io on every brick you have in the volume.
> >> - hot-tier may be great for improving reads, but for small files
> >> writes it actually kills performance even more.
> >> - FUSE driver requires context switch between userspace and kernel
> >> each time you create a file, so with small files the context switches
> >> are also taking their toll
> >> 
> >> The best results I got were:
> >> - create big file on Gluster, mount it as XFS over loopback interface
> >> - 13.5K smallfile writes. Drawback - you can use it only on one
> >> server, as XFS will crash when two servers will write to it.
> >> - use libgfapi - 20K smallfile writes performance. Drawback - no nice
> >> POSIX filesystem, huge CPU usage on Gluster server.
> >> 
> >> I was testing with 1KB files, so really small.
> >> 
> >> Best regards,
> >> Szymon Miotk
> >> 
> >> On Fri, Oct 6, 2017 at 4:43 PM, Gandalf Corvotempesta
> >> < gandalf.corvotempesta at gmail.com > wrote:
> >>> Any update about this?
> >>> I've seen some works about optimizing performance for small files, is
> >>> now gluster "usable" for storing, in example, Maildirs or git sources
> >>> ?
> >>> 
> >>> at least in 3.7 (or 3.8, I don't remember exactly), extracting kernel
> >>> sources took about 4-5 minutes.
> >>> _______________________________________________
> >>> Gluster-users mailing list
> >>> Gluster-users at gluster.org
> >>> http://lists.gluster.org/mailman/listinfo/gluster-users
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users