[Gluster-users] performance - what can I expect

Thu May 2 08:30:46 UTC 2019

On Thu, May 2, 2019 at 1:21 PM Pascal Suter <pascal.suter at dalco.ch> wrote:

> Hi Amar
>
> thanks for rolling this back up. Actually i have done some more
> benchmarking and fiddled with the config to finally reach a performance
> figure i could live with. I now can squeeze about 3GB/s out of that server
> which seems to be close to what i can get out of its network uplink (using
> IP over Omni-Path). The system is now set up and in production so i can't
> run any benchmarks on it anymore but i will get back at benchmarking in the
> near future to test some storage related hardware, and i will try it with
> gluster on top again.
>
> embarassingly the biggest performance issue was that the default
> installation of the server was running the "performance" profile of tuned.
> once i switched it to "throughput-performance" performance increased
> dramatically.
>
> the volume info now looks pretty unspectacular:
>
> Volume Name: storage
> Type: Distribute
> Volume ID: c81c7e46-add5-4d88-9945-24cf7947ef8c
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 3
> Transport-type: tcp
> Bricks:
> Brick1: themis01:/data/brick1/brick
> Brick2: themis01:/data/brick2/brick
> Brick3: themis01:/data/brick3/brick
> Options Reconfigured:
> transport.address-family: inet
> nfs.disable: on
>
> thanks for pointing out gluster volume profile, i'll have a go with it
> during my next benchmarking session. so far i was using iostat to track
> brick-level io performance during my benchmarks.
>
> the main question i wanted to ask was, if there is a general rule of
> thumb, how much throughput of the original bare brick throughput would be
> expected to be left over once gluster is added on top of it. to give you an
> example: when I use a parallel filesystem like Lustre or BeeGFS i usually
> expect to get at least about 85% of the raw storage target throughput as
> aggregated bandwidth over a multi-node test out of my Lustre or BeeGFS
> setup. I consider any numbers below that to be too low and therefore will
> have to dig into performance tuning to find the bottle neck.
>
> i was hoping someone could give me a rule-of-thumb number for a simple
> distributed gluster setup, like that 85% number i've established for a
> parallel file system.
>
> so at the moment my takeaway is, in a simple distributed volume across 3
> bricks with an aggregated bandwidth of 6GB/s i can expect to get about
> 3GB/s aggregated bandwith out of the gluster mount, given there are no
> bottle necks in the network. the 3GB/s is a number conducted under ideal
> circumstances, meaning, i primed the storage to make sure i could run a
> benchmark run using three nodes, with each node running a single thread
> writing to a single file and each file was located on another bricke. this
> yielded the maximum perfomance as this was pure streaming IO without any
> overlapping file writing to the bricks other than the overhead created by
> gluster's own internal mechanisms.
>
> Interestingly, the performance didn't drop much when i added nodes and
> threads and introduced more random-ish io by having several processes write
> to the same brick. So I assume, what "eats" up the 50% performance in the
> end is probably Gluster writing all these additional hidden files which I
> assume is some sort of Metadata. This causes additional IO on the disk that
> i'm streaming my one file to and therefore turns my streaming IO into a
> random io load for the raid controller and underlying harddisks which on
> spinning disks would have about the performance impact i was seing in my
> benchmarks.
>

Thanks for all these details.

I have yet to try gluster on a Flash based brick and test its performance
> there.. i would expect to see a better "efficiency" than the 50% i've
> measured on this system here as random io vs. streaming io should not make
> such a difference (or acutally almost no difference at all) on a flash
> based storage. but that's  me guessing now.
>
> so for the moment i'm fine but i would still be interested in hearing
> ball-park figure "efficiency" numbers from others using gluster in a
> similar setup.
>

We couldn't get a single number on this yet. Mainly because of multiple
reasons.
* Gluster's volume type has different behavior (performance wise)
* Network plays more significant role than that of disk performance. Mostly
latency involved in n/w than the throughput.
* Different work loads (like create heavy Vs read/write, sequential
read/write Vs random read/write) needs different options (currently they
are not auto-tuned).
* If one has good n/w and disk speed, even back end filesystem
configuration (because of the layout we have with gfid etc) too matter a
bit.

Best thing is to understand the workload first, and then tuning for it (at
present).

cheers
>
> Pascal
> On 01.05.19 14:55, Amar Tumballi Suryanarayan wrote:
>
> Hi Pascal,
>
> Sorry for complete delay in this one. And thanks for testing out in
> different scenarios.  Few questions before others can have a look and
> advice you.
>
> 1. What is the volume info output ?
>
> 2. Do you see any concerning logs in glusterfs log files?
>
> 3. Please use `gluster volume profile` while running the tests, and that
> gives a lot of information.
>
> 4. Considering you are using glusterfs-6.0, please take statedump of
> client process (on any node) before and after the test, so we can analyze
> the latency information of each translators.
>
> With these information, I hope we will be in a better state to answer the
> questions.
>
>
> On Wed, Apr 10, 2019 at 3:45 PM Pascal Suter <pascal.suter at dalco.ch>
> wrote:
>
>> i continued my testing with 5 clients, all attached over 100Gbit/s
>> omni-path via IP over IB. when i run the same iozone benchmark across
>> all 5 clients where gluster is mounted using the glusterfs client, i get
>> an aggretated write throughput of only about 400GB/s and an aggregated
>> read throughput of 1.5GB/s. Each node was writing a single 200Gb file in
>> 16MB chunks and the files where distributed across all three bricks on
>> the server.
>>
>> the connection was established over Omnipath for sure, as there is no
>> other link between the nodes and server.
>>
>> i have no clue what i'm doing wrong here. i can't believe that this is a
>> normal performance people would expect to see from gluster. i guess
>> nobody would be using it if it was this slow.
>>
>> again, when written dreictly to the xfs filesystem on the bricks, i get
>> over 6GB/s read and write throughput using the same benchmark.
>>
>> any advise is appreciated
>>
>> cheers
>>
>> Pascal
>>
>> On 04.04.19 12:03, Pascal Suter wrote:
>> > I just noticed i left the most important parameters out :)
>> >
>> > here's the write command with filesize and recordsize in it as well :)
>> >
>> > ./iozone -i 0 -t 1 -F /mnt/gluster/storage/thread1 -+n -c -C -e -I -w
>> > -+S 0 -s 200G -r 16384k
>> >
>> > also i ran the benchmark without direct_io which resulted in an even
>> > worse performance.
>> >
>> > i also tried to mount the gluster volume via nfs-ganesha which further
>> > reduced throughput down to about 450MB/s
>> >
>> > if i run the iozone benchmark with 3 threads writing to all three
>> > bricks directly (from the xfs filesystem) i get throughputs of around
>> > 6GB/s .. if I run the same benchmark through gluster mounted locally
>> > using the fuse client and with enough threads so that each brick gets
>> > at least one file written to it, i end up seing throughputs around
>> > 1.5GB/s .. that's a 4x decrease in performance. at it actually is the
>> > same if i run the benchmark with less threads and files only get
>> > written to two out of three bricks.
>> >
>> > cpu load on the server is around 25% by the way, nicely distributed
>> > across all available cores.
>> >
>> > i can't believe that gluster should really be so slow and everybody is
>> > just happily using it. any hints on what i'm doing wrong are very
>> > welcome.
>> >
>> > i'm using gluster 6.0 by the way.
>> >
>> > regards
>> >
>> > Pascal
>> >
>> > On 03.04.19 12:28, Pascal Suter wrote:
>> >> Hi all
>> >>
>> >> I am currently testing gluster on a single server. I have three
>> >> bricks, each a hardware RAID6 volume with thin provisioned LVM that
>> >> was aligned to the RAID and then formatted with xfs.
>> >>
>> >> i've created a distributed volume so that entire files get
>> >> distributed across my three bricks.
>> >>
>> >> first I ran a iozone benchmark across each brick testing the read and
>> >> write perofrmance of a single large file per brick
>> >>
>> >> i then mounted my gluster volume locally and ran another iozone run
>> >> with the same parameters writing a single file. the file went to
>> >> brick 1 which, when used driectly, would write with 2.3GB/s and read
>> >> with 1.5GB/s. however, through gluster i got only 800MB/s read and
>> >> 750MB/s write throughput
>> >>
>> >> another run with two processes each writing a file, where one file
>> >> went to the first brick and the other file to the second brick (which
>> >> by itself when directly accessed wrote at 2.8GB/s and read at
>> >> 2.7GB/s) resulted in 1.2GB/s of aggregated write and also aggregated
>> >> read throughput.
>> >>
>> >> Is this a normal performance i can expect out of a glusterfs or is it
>> >> worth tuning in order to really get closer to the actual brick
>> >> filesystem performance?
>> >>
>> >> here are the iozone commands i use for writing and reading.. note
>> >> that i am using directIO in order to make sure i don't get fooled by
>> >> cache :)
>> >>
>> >> ./iozone -i 0 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0
>> >> -s $filesize -r $recordsize > iozone-brick${b}-write.txt
>> >>
>> >> ./iozone -i 1 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0
>> >> -s $filesize -r $recordsize > iozone-brick${b}-read.txt
>> >>
>> >> cheers
>> >>
>> >> Pascal
>> >>
>> >> _______________________________________________
>> >> Gluster-users mailing list
>> >> Gluster-users at gluster.org
>> >> https://lists.gluster.org/mailman/listinfo/gluster-users
>> > _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users at gluster.org
>> > https://lists.gluster.org/mailman/listinfo/gluster-users
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>
> --
> Amar Tumballi (amarts)
>
>

-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190502/72bbfa83/attachment.html>