[Gluster-devel] I/O performance

Xavi Hernandez xhernandez at redhat.com
Thu Feb 14 07:35:16 UTC 2019


Here are the results of the last run:
https://docs.google.com/spreadsheets/d/19JqvuFKZxKifgrhLF-5-bgemYj8XKldUox1QwsmGj2k/edit?usp=sharing

Each test has been run with a rough approximation of the best configuration
I've found (in number of client and brick threads), but I haven't done an
exhaustive search of the best configuration in each case.

The "fio rand write" test seems to have a big regression. An initial check
of the data shows that 2 of the 5 runs have taken > 50% more time. I'll try
to check why.

Many of the tests show a very high disk utilization, so comparisons may not
be accurate. In any case it's clear that we need a method to automatically
adjust the number of worker threads to the given load to make this useful.
Without that it's virtually impossible to find a fixed number of threads
that will work fine in all cases. I'm currently working on this.

Xavi

On Wed, Feb 13, 2019 at 11:34 AM Xavi Hernandez <xhernandez at redhat.com>
wrote:

> On Tue, Feb 12, 2019 at 1:30 AM Vijay Bellur <vbellur at redhat.com> wrote:
>
>>
>>
>> On Tue, Feb 5, 2019 at 10:57 PM Xavi Hernandez <xhernandez at redhat.com>
>> wrote:
>>
>>> On Wed, Feb 6, 2019 at 7:00 AM Poornima Gurusiddaiah <
>>> pgurusid at redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Feb 5, 2019, 10:53 PM Xavi Hernandez <xhernandez at redhat.com
>>>> wrote:
>>>>
>>>>> On Fri, Feb 1, 2019 at 1:51 PM Xavi Hernandez <xhernandez at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> On Fri, Feb 1, 2019 at 1:25 PM Poornima Gurusiddaiah <
>>>>>> pgurusid at redhat.com> wrote:
>>>>>>
>>>>>>> Can the threads be categorised to do certain kinds of fops?
>>>>>>>
>>>>>>
>>>>>> Could be, but creating multiple thread groups for different tasks is
>>>>>> generally bad because many times you end up with lots of idle threads which
>>>>>> waste resources and could increase contention. I think we should only
>>>>>> differentiate threads if it's absolutely necessary.
>>>>>>
>>>>>>
>>>>>>> Read/write affinitise to certain set of threads, the other metadata
>>>>>>> fops to other set of threads. So we limit the read/write threads and not
>>>>>>> the metadata threads? Also if aio is enabled in the backend the threads
>>>>>>> will not be blocked on disk IO right?
>>>>>>>
>>>>>>
>>>>>> If we don't block the thread but we don't prevent more requests to go
>>>>>> to the disk, then we'll probably have the same problem. Anyway, I'll try to
>>>>>> run some tests with AIO to see if anything changes.
>>>>>>
>>>>>
>>>>> I've run some simple tests with AIO enabled and results are not good.
>>>>> A simple dd takes >25% more time. Multiple parallel dd take 35% more time
>>>>> to complete.
>>>>>
>>>>
>>>>
>>>> Thank you. That is strange! Had few questions, what tests are you
>>>> running for measuring the io-threads performance(not particularly aoi)? is
>>>> it dd from multiple clients?
>>>>
>>>
>>> Yes, it's a bit strange. What I see is that many threads from the thread
>>> pool are active but using very little CPU. I also see an AIO thread for
>>> each brick, but its CPU usage is not big either. Wait time is always 0 (I
>>> think this is a side effect of AIO activity). However system load grows
>>> very high. I've seen around 50, while on the normal test without AIO it's
>>> stays around 20-25.
>>>
>>> Right now I'm running the tests on a single machine (no real network
>>> communication) using an NVMe disk as storage. I use a single mount point.
>>> The tests I'm running are these:
>>>
>>>    - Single dd, 128 GiB, blocks of 1MiB
>>>    - 16 parallel dd, 8 GiB per dd, blocks of 1MiB
>>>    - fio in sequential write mode, direct I/O, blocks of 128k, 16
>>>    threads, 8GiB per file
>>>    - fio in sequential read mode, direct I/O, blocks of 128k, 16
>>>    threads, 8GiB per file
>>>    - fio in random write mode, direct I/O, blocks of 128k, 16 threads,
>>>    8GiB per file
>>>    - fio in random read mode, direct I/O, blocks of 128k, 16 threads,
>>>    8GiB per file
>>>    - smallfile create, 16 threads, 256 files per thread, 32 MiB per
>>>    file (with one brick down, for the following test)
>>>    - self-heal of an entire brick (from the previous smallfile test)
>>>    - pgbench init phase with scale 100
>>>
>>> I run all these tests for a replica 3 volume and a disperse 4+2 volume.
>>>
>>
>>
>> Are these performance results available somewhere? I am quite curious to
>> understand the performance gains on NVMe!
>>
>
> I'm updating test results with the latest build. I'll report it here once
> it's complete.
>
> Xavi
>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20190214/ffed6609/attachment-0001.html>


More information about the Gluster-devel mailing list