[Gluster-users] Tuning for small files

Fri Oct 2 17:29:46 UTC 2015

Right, so what I did is:
- on one node (gluster 3.7.3), run 'gluster volume shared profile start'
- on the client mount, run the test
- on the node, run 'gluster volume shared profile info' (and copied the
output)
- finally, ran 'gluster volume profile shared stop'

I repeated this for two different tests (simple rm followed by svn
checkout, and a more complete build test), on an NFS mount and on a Fuse
mount.

To my surprise the svn checkout is actually a lot faster (3x) on the Fuse
mount than NFS.
However the build test is a lot slower on the Fuse mount (+50%, which is a
lot considering the compilation is CPU intensive, not just I/Os!).

Ben I will send you the profile outputs separately now...
On 29 Sep 2015 9:40 pm, "Ben Turner" <bturner at redhat.com> wrote:

> ----- Original Message -----
> > From: "Thibault Godouet" <tibo92 at godouet.net>
> > To: "Ben Turner" <bturner at redhat.com>
> > Cc: hmlth at t-hamel.fr, gluster-users at gluster.org
> > Sent: Tuesday, September 29, 2015 1:36:20 PM
> > Subject: Re: [Gluster-users] Tuning for small files
> >
> > Ben,
> >
> > I suspect meta-data / 'ls -l' performance is very important for my svn
> > use-case.
> >
> > Having said that, what do you mean by small file performance? I thought
> > what people meant by this was really the overhead of meta-data, with a
> 'ls
> > -l' being a sort of extreme case (pure meta-data).
> > Obviously if you also have to read and write actual data (albeit not much
> > at all per file), then the effect of meta-data overhead would get diluted
> > to a degree, bit potentially still very present.
>
> Where you run into problems with smallfiles on gluster is latency of
> sending data over the wire.  For every smallfile create there are a bunch
> of different file opetations we have to do on every file.  For example we
> will have to do at least 1 lookup per brick to make sure that the file
> doesn't exist anywhere before we create it.  We actually got it down to 1
> per brick with lookup optimize on, its 2 IIRC(maybe more?) with it
> disabled.  So the time we spend waiting for those lookups to complete adds
> to latency which lowers the number of files that can be created in a given
> period of time.  Lookup optimize was implemented in 3.7 and like I said its
> now at the optimal 1 lookup per brick on creates.
>
> The other problem with small files that we had in 3.6 is that we were
> using a single threaded event listener(epoll is what we call it).  This
> single thread would spike a CPU to 100%(called a hot thread) and glusterfs
> would become CPU bound.  The solution here was to make the event listener
> multi threaded so that we could spread the epoll load across CPUs there by
> eliminating the CPU bottleneck and allowing us to process more events in a
> given time.  FYI epoll is defaulted to 2 threads in 3.7, but I have seen
> cases where I still bottlenecked on CPU without 4 threads in my envs, so I
> usually do 4.  This was implemented in upstream 3.7 but was backported to
> RHGS 3.0.4 if you have a RH based version.
>
> Fixing these two issues lead to the performance gains I was talking about
> with smallfile creates.  You are probably thinking from a distributed FS +
> metadata server perspective(MDS) where the bottleneck is the MDS for
> smallfiles.  Since gluster doesn't have an MDS that load is transferred to
> the clients / servers and this lead to a CPU bottleneck when epoll was
> single threaded.  I think this is the piece you may have been missing.
>
> >
> > Would there be an easy way to tell how much time is spent on meta-data
> vs.
> > Data in a profile output?
>
> Yep!  Can you gather some profiling info and send it to me?
>
> >
> > One thing I wonder: do your comments apply to both native Fuse and NFS
> > mounts?
> >
> > Finally, all this brings me back to my initial question really: are there
> > any tuning recommendation of configuration tuning for my requirement
> (small
> > file read/writes on a pair of nodes with replication) beyond the thread
> > counts and lookup optimize?
> > Or are those by far the most important in this scenario?
>
> For creating a bunch of small files those are the only two that I know of
> that will have a large impact, maybe some others from the list can give
> some input on anything else we can do here.
>
> -b
>
> >
> > Thx,
> > Thibault.
> > ----- Original Message -----
> > > From: hmlth at t-hamel.fr
> > > To: abauer at magix.net
> > > Cc: gluster-users at gluster.org
> > > Sent: Monday, September 28, 2015 7:40:52 AM
> > > Subject: Re: [Gluster-users] Tuning for small files
> > >
> > > I'm also quite interested by small files performances optimization, but
> > > I'm a bit confused about the best option between 3.6/3.7.
> > >
> > > Ben Turner was saying that 3.6 might give the best performances:
> > >
> http://www.gluster.org/pipermail/gluster-users/2015-September/023733.html
> > >
> > > What kind of gain is expected (with consistent-metadata) if this
> > > regression is solved?
> >
> > Just to be clear, the issue I am talking about is metadata only(think ls
> -l
> > or file browsing).  It doesn't affect small file perf(well not that much,
> > I'm sure a little, but I have never quantified it), with server and
> client
> > event threads set to 4 + lookup optimize I see between a 200-300% gain on
> > my systems on 3.7 vs 3.6 builds.  If I needed fast metadata I would go
> with
> > 3.6, if I need fast smallfile I would go with 3.7.  If I needed both I
> > would pick the less of the two evils and go with that one and upgrade
> when
> > the fix is released.
> >
> > -b
> >
> >
> > >
> > > I tried 3.6.5 (last version for debian jessie), and it's a bit better
> > > than 3.7.4 but not by much (10-15%).
> > >
> > > I was also wondering if there is recommendations for the underlying
> file
> > > system of the bricks (xfs, ext4, tuning...).
> > >
> > >
> > > Regards
> > >
> > > Thomas HAMEL
> > >
> > > On 2015-09-28 12:04, André Bauer wrote:
> > > > If you're not already on Glusterfs 3.7.x i would recommend an update
> > > > first.
> > > >
> > > > Am 25.09.2015 um 17:49 schrieb Thibault Godouet:
> > > >> Hi,
> > > >>
> > > >> There are quite a few tuning parameters for Gluster (as seen in
> > > >> Gluster
> > > >> volume XYZ get all), but I didn't find much documentation on those.
> > > >> Some people do seem to set at least some of them, so the knowledge
> > > >> must
> > > >> be somewhere...
> > > >>
> > > >> Is there a good source of information to understand what they mean,
> > > >> and
> > > >> recommendation on how to set them to get a good small file
> > > >> performance?
> > > >>
> > > >> Basically what I'm trying to optimize is for svn operations (e.g.
> svn
> > > >> checkout, or svn branch) on a replicated 2 x 1 volume (hosted on 2
> > > >> VMs,
> > > >> 16GB ram, 4 cores each, 10Gb/s network tested at full speed), using
> a
> > > >> NFS mount which appears much faster than fuse in this case (but
> still
> > > >> much slower than when served by a normal NFS server).
> > > >> Any recommendation for such a setup?
> > > >>
> > > >> Thanks,
> > > >> Thibault.
> > > >>
> > > >>
> > > >>
> > > >> _______________________________________________
> > > >> Gluster-users mailing list
> > > >> Gluster-users at gluster.org
> > > >> http://www.gluster.org/mailman/listinfo/gluster-users
> > > >>
> > > >
> > > >
> > > > --
> > > > Mit freundlichen Grüßen
> > > > André Bauer
> > > >
> > > > MAGIX Software GmbH
> > > > André Bauer
> > > > Administrator
> > > > August-Bebel-Straße 48
> > > > 01219 Dresden
> > > > GERMANY
> > > >
> > > > tel.: 0351 41884875
> > > > e-mail: abauer at magix.net
> > > > abauer at magix.net <mailto:Email>
> > > > www.magix.com <http://www.magix.com/>
> > > >
> > > >
> > > > Geschäftsführer | Managing Directors: Dr. Arnd Schröder, Michael
> Keith
> > > > Amtsgericht | Commercial Register: Berlin Charlottenburg, HRB 127205
> > > >
> > > > Find us on:
> > > >
> > > > <http://www.facebook.com/MAGIX> <http://www.twitter.com/magix_de>
> > > > <http://www.youtube.com/wwwmagixcom> <http://www.magixmagazin.de>
> > > >
> ----------------------------------------------------------------------
> > > > The information in this email is intended only for the addressee
> named
> > > > above. Access to this email by anyone else is unauthorized. If you
> are
> > > > not the intended recipient of this message any disclosure, copying,
> > > > distribution or any action taken in reliance on it is prohibited and
> > > > may be unlawful. MAGIX does not warrant that any attachments are free
> > > > from viruses or other defects and accepts no liability for any losses
> > > > resulting from infected email transmissions. Please note that any
> > > > views expressed in this email may be those of the originator and do>
> >
> > Gluster-users mailing list
> > > > Gluster-users at gluster.org
> > > > http://www.gluster.org/mailman/listinfo/gluster-users
> > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-users
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151002/90ffc96d/attachment.html>