[Gluster-users] Performance is falling rapidly when updating from v5.5 to v7.0

David Spisla spisla80 at gmail.com
Wed Nov 6 15:34:37 UTC 2019


I did another test with inode_size on xfs bricks=1024Bytes, but it had also
no effect. Here is the measurement:

(All values in MiB/s)
64KiB    1MiB     10MiB
0,16       2,52       76,58

Beside of that I was not able to set the xattr trusted.io-stats-dump. I am
wondering myself why it is not working

Regards
David Spisla

Am Mi., 6. Nov. 2019 um 11:16 Uhr schrieb RAFI KC <rkavunga at redhat.com>:

>
> On 11/6/19 3:42 PM, David Spisla wrote:
>
> Hello Rafi,
>
> I tried to set the xattr via
>
> setfattr -n trusted.io-stats-dump -v '/tmp/iostat.log'
> /gluster/repositories/repo1/
>
> but it had no effect. There is no such a xattr via getfattr and no
> logfile. The command setxattr is not available. What I am doing wrong?
>
>
> I will check it out and get back to you.
>
>
> By the way, you mean to increase the inode size of xfs layer from 512
> Bytes to 1024KB(!)? I think it should be 1024 Bytes because 2048 Bytes is
> the maximum
>
> It was a type, I meant to set up 1024 bytes, sorry for that.
>
>
> Regards
> David
>
> Am Mi., 6. Nov. 2019 um 04:10 Uhr schrieb RAFI KC <rkavunga at redhat.com>:
>
>> I will take a look at the profile info shared. Since there is a huge
>> difference in the performance numbers between fuse and samba, it would be
>> great if we can get the profile info of fuse (on v7). This will help to
>> compare the number of calls for each fops. There should be some fops that
>> samba repeat, and we can find out it by comparing with fuse.
>>
>> Also if possible, can you please get client profile info from fuse mount
>> using the command `setxattr -n trusted.io-stats-dump -v <logfile
>> /tmp/iostat.log> </mnt/fuse(mount point)>`.
>>
>>
>> Regards
>>
>> Rafi KC
>>
>> On 11/5/19 11:05 PM, David Spisla wrote:
>>
>> I did the test with Gluster 7.0 ctime disabled. But it had no effect:
>> (All values in MiB/s)
>> 64KiB    1MiB     10MiB
>> 0,16       2,60       54,74
>>
>> Attached there is now the complete profile file also with the results
>> from the last test. I will not repeat it with an higher inode size because
>> I don't think this will have an effect.
>> There must be another cause for the low performance
>>
>>
>> Yes. No need to try with higher inode size
>>
>>
>>
>> Regards
>> David Spisla
>>
>> Am Di., 5. Nov. 2019 um 16:25 Uhr schrieb David Spisla <
>> spisla80 at gmail.com>:
>>
>>>
>>>
>>> Am Di., 5. Nov. 2019 um 12:06 Uhr schrieb RAFI KC <rkavunga at redhat.com>:
>>>
>>>>
>>>> On 11/4/19 8:46 PM, David Spisla wrote:
>>>>
>>>> Dear Gluster Community,
>>>>
>>>> I also have a issue concerning performance. The last days I updated our
>>>> test cluster from GlusterFS v5.5 to v7.0 . The setup in general:
>>>>
>>>> 2 HP DL380 Servers with 10Gbit NICs, 1 Distribute-Replica 2 Volume with
>>>> 2 Replica Pairs. Client is SMB Samba (access via vfs_glusterfs) . I did
>>>> several tests to ensure that Samba don't causes the fall.
>>>> The setup ist completely the same except the Gluster Version
>>>> Here are my results:
>>>> 64KiB           1MiB             10MiB            (Filesize)
>>>> 3,49             47,41            300,50          (Values in MiB/s with
>>>> GlusterFS v5.5)
>>>> 0,16              2,61             76,63            (Values in MiB/s
>>>> with GlusterFS v7.0)
>>>>
>>>>
>>>> Can you please share the profile information [1] for both versions?
>>>> Also it would be really helpful if you can mention the io patterns that
>>>> used for this tests.
>>>>
>>>> [1] :
>>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Monitoring%20Workload/
>>>>
>>> Hello Rafi,
>>> thank you for your help.
>>>
>>> * First more information about the io patterns: As a client we use a
>>> DL360 Windws Server 2017 machine with 10Gbit NIC connected to the storage
>>> machines. The share will be mounted via SMB and the tests writes with fio.
>>> We use this job files (see attachment). Each job file will be executed
>>> separetely and there is a sleep about 60s between each test run to calm
>>> down the system before starting a new test.
>>>
>>> * Attached below you find the profile output from the tests with v5.5
>>> (ctime enabled), v7.0 (ctime enabled).
>>>
>>> * Beside of the tests with Samba I did also some fio tests directly on
>>> the FUSE Mounts (locally on one of the storage nodes). The results show
>>> that there is only a small decrease of performance between v5.5 and v7.0
>>> (All values in MiB/s)
>>> 64KiB    1MiB     10MiB
>>> 50,09     679,96   1023,02 (v5.5)
>>> 47,00     656,46    977,60 (v7.0)
>>>
>>> It seems to be that the combination of samba + gluster7.0 has a lot of
>>> problems, or not?
>>>
>>>
>>>>
>>>> We use this volume options (GlusterFS 7.0):
>>>>
>>>> Volume Name: archive1
>>>> Type: Distributed-Replicate
>>>> Volume ID: 44c17844-0bd4-4ca2-98d8-a1474add790c
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 2 x 2 = 4
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: fs-dl380-c1-n1:/gluster/brick1/glusterbrick
>>>> Brick2: fs-dl380-c1-n2:/gluster/brick1/glusterbrick
>>>> Brick3: fs-dl380-c1-n1:/gluster/brick2/glusterbrick
>>>> Brick4: fs-dl380-c1-n2:/gluster/brick2/glusterbrick
>>>> Options Reconfigured:
>>>> performance.client-io-threads: off
>>>> nfs.disable: on
>>>> storage.fips-mode-rchecksum: on
>>>> transport.address-family: inet
>>>> user.smb: disable
>>>> features.read-only: off
>>>> features.worm: off
>>>> features.worm-file-level: on
>>>> features.retention-mode: enterprise
>>>> features.default-retention-period: 120
>>>> network.ping-timeout: 10
>>>> features.cache-invalidation: on
>>>> features.cache-invalidation-timeout: 600
>>>> performance.nl-cache: on
>>>> performance.nl-cache-timeout: 600
>>>> client.event-threads: 32
>>>> server.event-threads: 32
>>>> cluster.lookup-optimize: on
>>>> performance.stat-prefetch: on
>>>> performance.cache-invalidation: on
>>>> performance.md-cache-timeout: 600
>>>> performance.cache-samba-metadata: on
>>>> performance.cache-ima-xattrs: on
>>>> performance.io-thread-count: 64
>>>> cluster.use-compound-fops: on
>>>> performance.cache-size: 512MB
>>>> performance.cache-refresh-timeout: 10
>>>> performance.read-ahead: off
>>>> performance.write-behind-window-size: 4MB
>>>> performance.write-behind: on
>>>> storage.build-pgfid: on
>>>> features.ctime: on
>>>> cluster.quorum-type: fixed
>>>> cluster.quorum-count: 1
>>>> features.bitrot: on
>>>> features.scrub: Active
>>>> features.scrub-freq: daily
>>>>
>>>> For GlusterFS 5.5 its nearly the same except the fact that there were 2
>>>> options to enable ctime feature.
>>>>
>>>>
>>>>
>>>> Ctime stores additional metadata information as an extended attributes
>>>> which sometimes exceeds the default inode size. In such scenarios the
>>>> additional xattrs won't fit into the default size. This will result in
>>>> additional blocks to be used to store xattrs in the inide, which will
>>>> effect the latency. This is purely based on the i/o operations and the
>>>> total xattrs size stored in the inode.
>>>>
>>>> Is it possible for you to repeat the test by disabling ctime or
>>>> increasing the inode size to a higher value say 1024KB?
>>>>
>>> I will do so but for today I could not finish tests with ctime disabled
>>> (or higher inode value) because it takes a lot of time with v7.0 due to the
>>> low performance and I will perform it tomorrow. As soon as possible I give
>>> you the results.
>>> By the way: You really mean inode size on xfs layer 1024KB? Or do you
>>> mean 1024Bytes? We use per default 512Bytes, because this is the
>>> recommended size until now . But it seems to be that there is a need for a
>>> new recommendation when using ctime feature as a default. I can not image
>>> that this is the real cause for the low performance because in v5.5 we also
>>> use ctime feature with inode size 512Bytes.
>>>
>>> Regards
>>> David
>>>
>>>>
>>>> Our optimization for Samba looks like this (for every version):
>>>>
>>>> [global]
>>>> workgroup = SAMBA
>>>> netbios name = CLUSTER
>>>> kernel share modes = no
>>>> aio read size = 1
>>>> aio write size = 1
>>>> kernel oplocks = no
>>>> max open files = 100000
>>>> nt acl support = no
>>>> security = user
>>>> server min protocol = SMB2
>>>> store dos attributes = no
>>>> strict locking = no
>>>> full_audit:failure = pwrite_send pwrite_recv pwrite offload_write_send
>>>> offload_write_recv create_file open unlink connect disconnect rename chown
>>>> fchown lchown chmod fchmod mkdir rmdir ntimes ftruncate fallocate
>>>> full_audit:success = pwrite_send pwrite_recv pwrite offload_write_send
>>>> offload_write_recv create_file open unlink connect disconnect rename chown
>>>> fchown lchown chmod fchmod mkdir rmdir ntimes ftruncate fallocate
>>>> full_audit:facility = local5
>>>> durable handles = yes
>>>> posix locking = no
>>>> log level = 2
>>>> max log size = 100000
>>>> debug pid = yes
>>>>
>>>> What can be the cause for this rapid falling of the performance for
>>>> small files? Are some of our vol options not recommended anymore?
>>>> There were some patches concerning performance for small files in v6.0
>>>> und v7.0 :
>>>>
>>>> #1670031 <https://bugzilla.redhat.com/1670031>: performance regression
>>>> seen with smallfile workload tests
>>>>
>>>> #1659327 <https://bugzilla.redhat.com/1659327>: 43% regression in
>>>> small-file sequential read performance
>>>>
>>>> And one patch for the io-cache:
>>>>
>>>> #1659869 <https://bugzilla.redhat.com/1659869>: improvements to
>>>> io-cache
>>>>
>>>> Regards
>>>>
>>>> David Spisla
>>>>
>>>>
>>>> ________
>>>>
>>>> Community Meeting Calendar:
>>>>
>>>> APAC Schedule -
>>>> Every 2nd and 4th Tuesday at 11:30 AM IST
>>>> Bridge: https://bluejeans.com/118564314
>>>>
>>>> NA/EMEA Schedule -
>>>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>>>> Bridge: https://bluejeans.com/118564314
>>>>
>>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20191106/865c115b/attachment.html>


More information about the Gluster-users mailing list