[Gluster-users] Extremely low performance - am I doing somethingwrong?

Wed Jul 3 20:15:08 UTC 2019

Indeed, I wouldn't be surprised if I had around 80-100 MB/s, but 10-15
MB/s is really few. :-(

Even when I mount a filesystem on the same GlusterFS node, I have the
following result:
10485760 bytes (10 MB) copied, 0.409856 s, 25.6 MB/s
10485760 bytes (10 MB) copied, 0.38967 s, 26.9 MB/s
10485760 bytes (10 MB) copied, 0.466758 s, 22.5 MB/s
10485760 bytes (10 MB) copied, 0.412075 s, 25.4 MB/s
10485760 bytes (10 MB) copied, 0.381626 s, 27.5 MB/s

At the same time on the same node when I'm writing directly to the disk:
10485760 bytes (10 MB) copied, 0.0326612 s, 321 MB/s
10485760 bytes (10 MB) copied, 0.0302878 s, 346 MB/s
10485760 bytes (10 MB) copied, 0.0352449 s, 298 MB/s
10485760 bytes (10 MB) copied, 0.0316872 s, 331 MB/s
10485760 bytes (10 MB) copied, 0.0333189 s, 315 MB/s

Can't explain it to myself. Are replica-3 volumes really so slow?

On Wed, Jul 03, 2019 at 03:16:45PM -0400, Dmitry Filonov wrote:
> Well, if your network is limited to 100MB/s then it doesn't matter if
> storage is capable of doing 300+MB/s.
> But 15 MB/s is still way less than 100 MB/s
> 
> P.S. just tried on my gluster and found out that am getting ~15MB/s on
> replica 3 volume on SSDs and... 2MB/s on replica 3 volume on HDDs.
> Something to look at next week.
> 
> 
> 
> --
> Dmitry Filonov
> Linux Administrator
> SBGrid Core | Harvard Medical School
> 250 Longwood Ave, SGM-114
> Boston, MA 02115
> 
> 
> On Wed, Jul 3, 2019 at 12:18 PM Vladimir Melnik <v.melnik at tucha.ua> wrote:
> 
> > Thank you, it helped a little:
> >
> > $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs1/test.tmp bs=1M
> > count=10 oflag=sync; rm -f /mnt/glusterfs1/test.tmp; } done 2>&1 | grep
> > copied
> > 10485760 bytes (10 MB) copied, 0.738968 s, 14.2 MB/s
> > 10485760 bytes (10 MB) copied, 0.725296 s, 14.5 MB/s
> > 10485760 bytes (10 MB) copied, 0.681508 s, 15.4 MB/s
> > 10485760 bytes (10 MB) copied, 0.85566 s, 12.3 MB/s
> > 10485760 bytes (10 MB) copied, 0.661457 s, 15.9 MB/s
> >
> > But 14-15 MB/s is still quite far from the actual storage's performance
> > (200-3000 MB/s). :-(
> >
> > Here's full configuration dump (just in case):
> >
> > Option                                  Value
> > ------                                  -----
> > cluster.lookup-unhashed                 on
> > cluster.lookup-optimize                 on
> > cluster.min-free-disk                   10%
> > cluster.min-free-inodes                 5%
> > cluster.rebalance-stats                 off
> > cluster.subvols-per-directory           (null)
> > cluster.readdir-optimize                off
> > cluster.rsync-hash-regex                (null)
> > cluster.extra-hash-regex                (null)
> > cluster.dht-xattr-name                  trusted.glusterfs.dht
> > cluster.randomize-hash-range-by-gfid    off
> > cluster.rebal-throttle                  normal
> > cluster.lock-migration                  off
> > cluster.force-migration                 off
> > cluster.local-volume-name               (null)
> > cluster.weighted-rebalance              on
> > cluster.switch-pattern                  (null)
> > cluster.entry-change-log                on
> > cluster.read-subvolume                  (null)
> > cluster.read-subvolume-index            -1
> > cluster.read-hash-mode                  1
> > cluster.background-self-heal-count      8
> > cluster.metadata-self-heal              off
> > cluster.data-self-heal                  off
> > cluster.entry-self-heal                 off
> > cluster.self-heal-daemon                on
> > cluster.heal-timeout                    600
> > cluster.self-heal-window-size           1
> > cluster.data-change-log                 on
> > cluster.metadata-change-log             on
> > cluster.data-self-heal-algorithm        full
> > cluster.eager-lock                      enable
> > disperse.eager-lock                     on
> > disperse.other-eager-lock               on
> > disperse.eager-lock-timeout             1
> > disperse.other-eager-lock-timeout       1
> > cluster.quorum-type                     auto
> > cluster.quorum-count                    (null)
> > cluster.choose-local                    off
> > cluster.self-heal-readdir-size          1KB
> > cluster.post-op-delay-secs              1
> > cluster.ensure-durability               on
> > cluster.consistent-metadata             no
> > cluster.heal-wait-queue-length          128
> > cluster.favorite-child-policy           none
> > cluster.full-lock                       yes
> > diagnostics.latency-measurement         off
> > diagnostics.dump-fd-stats               off
> > diagnostics.count-fop-hits              off
> > diagnostics.brick-log-level             INFO
> > diagnostics.client-log-level            INFO
> > diagnostics.brick-sys-log-level         CRITICAL
> > diagnostics.client-sys-log-level        CRITICAL
> > diagnostics.brick-logger                (null)
> > diagnostics.client-logger               (null)
> > diagnostics.brick-log-format            (null)
> > diagnostics.client-log-format           (null)
> > diagnostics.brick-log-buf-size          5
> > diagnostics.client-log-buf-size         5
> > diagnostics.brick-log-flush-timeout     120
> > diagnostics.client-log-flush-timeout    120
> > diagnostics.stats-dump-interval         0
> > diagnostics.fop-sample-interval         0
> > diagnostics.stats-dump-format           json
> > diagnostics.fop-sample-buf-size         65535
> > diagnostics.stats-dnscache-ttl-sec      86400
> > performance.cache-max-file-size         0
> > performance.cache-min-file-size         0
> > performance.cache-refresh-timeout       1
> > performance.cache-priority
> > performance.cache-size                  32MB
> > performance.io-thread-count             16
> > performance.high-prio-threads           16
> > performance.normal-prio-threads         16
> > performance.low-prio-threads            32
> > performance.least-prio-threads          1
> > performance.enable-least-priority       on
> > performance.iot-watchdog-secs           (null)
> > performance.iot-cleanup-disconnected-reqsoff
> > performance.iot-pass-through            false
> > performance.io-cache-pass-through       false
> > performance.cache-size                  128MB
> > performance.qr-cache-timeout            1
> > performance.cache-invalidation          false
> > performance.ctime-invalidation          false
> > performance.flush-behind                on
> > performance.nfs.flush-behind            on
> > performance.write-behind-window-size    1MB
> > performance.resync-failed-syncs-after-fsyncoff
> > performance.nfs.write-behind-window-size1MB
> > performance.strict-o-direct             off
> > performance.nfs.strict-o-direct         off
> > performance.strict-write-ordering       off
> > performance.nfs.strict-write-ordering   off
> > performance.write-behind-trickling-writeson
> > performance.aggregate-size              128KB
> > performance.nfs.write-behind-trickling-writeson
> > performance.lazy-open                   yes
> > performance.read-after-open             yes
> > performance.open-behind-pass-through    false
> > performance.read-ahead-page-count       4
> > performance.read-ahead-pass-through     false
> > performance.readdir-ahead-pass-through  false
> > performance.md-cache-pass-through       false
> > performance.md-cache-timeout            1
> > performance.cache-swift-metadata        true
> > performance.cache-samba-metadata        false
> > performance.cache-capability-xattrs     true
> > performance.cache-ima-xattrs            true
> > performance.md-cache-statfs             off
> > performance.xattr-cache-list
> > performance.nl-cache-pass-through       false
> > features.encryption                     off
> > network.frame-timeout                   1800
> > network.ping-timeout                    42
> > network.tcp-window-size                 (null)
> > client.ssl                              off
> > network.remote-dio                      enable
> > client.event-threads                    4
> > client.tcp-user-timeout                 0
> > client.keepalive-time                   20
> > client.keepalive-interval               2
> > client.keepalive-count                  9
> > network.tcp-window-size                 (null)
> > network.inode-lru-limit                 16384
> > auth.allow                              *
> > auth.reject                             (null)
> > transport.keepalive                     1
> > server.allow-insecure                   on
> > server.root-squash                      off
> > server.all-squash                       off
> > server.anonuid                          65534
> > server.anongid                          65534
> > server.statedump-path                   /var/run/gluster
> > server.outstanding-rpc-limit            64
> > server.ssl                              off
> > auth.ssl-allow                          *
> > server.manage-gids                      off
> > server.dynamic-auth                     on
> > client.send-gids                        on
> > server.gid-timeout                      300
> > server.own-thread                       (null)
> > server.event-threads                    4
> > server.tcp-user-timeout                 42
> > server.keepalive-time                   20
> > server.keepalive-interval               2
> > server.keepalive-count                  9
> > transport.listen-backlog                1024
> > transport.address-family                inet
> > performance.write-behind                on
> > performance.read-ahead                  off
> > performance.readdir-ahead               on
> > performance.io-cache                    off
> > performance.open-behind                 on
> > performance.quick-read                  off
> > performance.nl-cache                    off
> > performance.stat-prefetch               on
> > performance.client-io-threads           on
> > performance.nfs.write-behind            on
> > performance.nfs.read-ahead              off
> > performance.nfs.io-cache                off
> > performance.nfs.quick-read              off
> > performance.nfs.stat-prefetch           off
> > performance.nfs.io-threads              off
> > performance.force-readdirp              true
> > performance.cache-invalidation          false
> > performance.global-cache-invalidation   true
> > features.uss                            off
> > features.snapshot-directory             .snaps
> > features.show-snapshot-directory        off
> > features.tag-namespaces                 off
> > network.compression                     off
> > network.compression.window-size         -15
> > network.compression.mem-level           8
> > network.compression.min-size            0
> > network.compression.compression-level   -1
> > network.compression.debug               false
> > features.default-soft-limit             80%
> > features.soft-timeout                   60
> > features.hard-timeout                   5
> > features.alert-time                     86400
> > features.quota-deem-statfs              off
> > geo-replication.indexing                off
> > geo-replication.indexing                off
> > geo-replication.ignore-pid-check        off
> > geo-replication.ignore-pid-check        off
> > features.quota                          off
> > features.inode-quota                    off
> > features.bitrot                         disable
> > debug.trace                             off
> > debug.log-history                       no
> > debug.log-file                          no
> > debug.exclude-ops                       (null)
> > debug.include-ops                       (null)
> > debug.error-gen                         off
> > debug.error-failure                     (null)
> > debug.error-number                      (null)
> > debug.random-failure                    off
> > debug.error-fops                        (null)
> > nfs.disable                             on
> > features.read-only                      off
> > features.worm                           off
> > features.worm-file-level                off
> > features.worm-files-deletable           on
> > features.default-retention-period       120
> > features.retention-mode                 relax
> > features.auto-commit-period             180
> > storage.linux-aio                       off
> > storage.batch-fsync-mode                reverse-fsync
> > storage.batch-fsync-delay-usec          0
> > storage.owner-uid                       -1
> > storage.owner-gid                       -1
> > storage.node-uuid-pathinfo              off
> > storage.health-check-interval           30
> > storage.build-pgfid                     off
> > storage.gfid2path                       on
> > storage.gfid2path-separator             :
> > storage.reserve                         1
> > storage.health-check-timeout            10
> > storage.fips-mode-rchecksum             off
> > storage.force-create-mode               0000
> > storage.force-directory-mode            0000
> > storage.create-mask                     0777
> > storage.create-directory-mask           0777
> > storage.max-hardlinks                   100
> > features.ctime                          on
> > config.gfproxyd                         off
> > cluster.server-quorum-type              server
> > cluster.server-quorum-ratio             0
> > changelog.changelog                     off
> > changelog.changelog-dir                 {{ brick.path
> > }}/.glusterfs/changelogs
> > changelog.encoding                      ascii
> > changelog.rollover-time                 15
> > changelog.fsync-interval                5
> > changelog.changelog-barrier-timeout     120
> > changelog.capture-del-path              off
> > features.barrier                        disable
> > features.barrier-timeout                120
> > features.trash                          off
> > features.trash-dir                      .trashcan
> > features.trash-eliminate-path           (null)
> > features.trash-max-filesize             5MB
> > features.trash-internal-op              off
> > cluster.enable-shared-storage           disable
> > locks.trace                             off
> > locks.mandatory-locking                 off
> > cluster.disperse-self-heal-daemon       enable
> > cluster.quorum-reads                    no
> > client.bind-insecure                    (null)
> > features.shard                          on
> > features.shard-block-size               64MB
> > features.shard-lru-limit                16384
> > features.shard-deletion-rate            100
> > features.scrub-throttle                 lazy
> > features.scrub-freq                     biweekly
> > features.scrub                          false
> > features.expiry-time                    120
> > features.cache-invalidation             off
> > features.cache-invalidation-timeout     60
> > features.leases                         off
> > features.lease-lock-recall-timeout      60
> > disperse.background-heals               8
> > disperse.heal-wait-qlength              128
> > cluster.heal-timeout                    600
> > dht.force-readdirp                      on
> > disperse.read-policy                    gfid-hash
> > cluster.shd-max-threads                 8
> > cluster.shd-wait-qlength                10000
> > cluster.shd-wait-qlength                10000
> > cluster.locking-scheme                  granular
> > cluster.granular-entry-heal             no
> > features.locks-revocation-secs          0
> > features.locks-revocation-clear-all     false
> > features.locks-revocation-max-blocked   0
> > features.locks-monkey-unlocking         false
> > features.locks-notify-contention        no
> > features.locks-notify-contention-delay  5
> > disperse.shd-max-threads                1
> > disperse.shd-wait-qlength               1024
> > disperse.cpu-extensions                 auto
> > disperse.self-heal-window-size          1
> > cluster.use-compound-fops               off
> > performance.parallel-readdir            off
> > performance.rda-request-size            131072
> > performance.rda-low-wmark               4096
> > performance.rda-high-wmark              128KB
> > performance.rda-cache-limit             10MB
> > performance.nl-cache-positive-entry     false
> > performance.nl-cache-limit              10MB
> > performance.nl-cache-timeout            60
> > cluster.brick-multiplex                 off
> > cluster.max-bricks-per-process          250
> > disperse.optimistic-change-log          on
> > disperse.stripe-cache                   4
> > cluster.halo-enabled                    False
> > cluster.halo-shd-max-latency            99999
> > cluster.halo-nfsd-max-latency           5
> > cluster.halo-max-latency                5
> > cluster.halo-max-replicas               99999
> > cluster.halo-min-replicas               2
> > features.selinux                        on
> > cluster.daemon-log-level                INFO
> > debug.delay-gen                         off
> > delay-gen.delay-percentage              10%
> > delay-gen.delay-duration                100000
> > delay-gen.enable
> > disperse.parallel-writes                on
> > features.sdfs                           off
> > features.cloudsync                      off
> > features.ctime                          on
> > ctime.noatime                           on
> > feature.cloudsync-storetype             (null)
> > features.enforce-mandatory-lock         off
> >
> > What do you think, are there any other knobs worth to be turned?
> >
> > Thanks!
> >
> > On Wed, Jul 03, 2019 at 06:55:09PM +0300, Strahil wrote:
> > > Check the following link (4.1)  for the optimal gluster volume settings.
> > > They are quite safe.
> > >
> > > Gluster  provides a group called  virt (/var/lib/glusterd/groups/virt)
> >  and can be applied via  'gluster volume set VOLNAME group virt'
> > >
> > > Then try again.
> > >
> > > Best Regards,
> > > Strahil NikolovOn Jul 3, 2019 11:39, Vladimir Melnik <v.melnik at tucha.ua>
> > wrote:
> > > >
> > > > Dear colleagues,
> > > >
> > > > I have a lab with a bunch of virtual machines (the virtualization is
> > > > provided by KVM) running on the same physical host. 4 of these VMs are
> > > > working as a GlusterFS cluster and there's one more VM that works as a
> > > > client. I'll specify all the packages' versions in the ending of this
> > > > message.
> > > >
> > > > I created 2 volumes - one is having type "Distributed-Replicate" and
> > > > another one is "Distribute". The problem is that both of volumes are
> > > > showing really poor performance.
> > > >
> > > > Here's what I see on the client:
> > > > $ mount | grep gluster
> > > > 10.13.1.16:storage1 on /mnt/glusterfs1 type
> > fuse.glusterfs(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
> >
> > > > 10.13.1.16:storage2 on /mnt/glusterfs2 type
> > fuse.glusterfs(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
> >
> > > >
> > > > $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs1/test.tmp
> > bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs1/test.tmp; } done
> > > > 10+0 records in
> > > > 10+0 records out
> > > > 10485760 bytes (10 MB) copied, 1.47936 s, 7.1 MB/s
> > > > 10+0 records in
> > > > 10+0 records out
> > > > 10485760 bytes (10 MB) copied, 1.62546 s, 6.5 MB/s
> > > > 10+0 records in
> > > > 10+0 records out
> > > > 10485760 bytes (10 MB) copied, 1.71229 s, 6.1 MB/s
> > > > 10+0 records in
> > > > 10+0 records out
> > > > 10485760 bytes (10 MB) copied, 1.68607 s, 6.2 MB/s
> > > > 10+0 records in
> > > > 10+0 records out
> > > > 10485760 bytes (10 MB) copied, 1.82204 s, 5.8 MB/s
> > > >
> > > > $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs2/test.tmp
> > bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs2/test.tmp; } done
> > > > 10+0 records in
> > > > 10+0 records out
> > > > 10485760 bytes (10 MB) copied, 1.15739 s, 9.1 MB/s
> > > > 10+0 records in
> > > > 10+0 records out
> > > > 10485760 bytes (10 MB) copied, 0.978528 s, 10.7 MB/s
> > > > 10+0 records in
> > > > 10+0 records out
> > > > 10485760 bytes (10 MB) copied, 0.910642 s, 11.5 MB/s
> > > > 10+0 records in
> > > > 10+0 records out
> > > > 10485760 bytes (10 MB) copied, 0.998249 s, 10.5 MB/s
> > > > 10+0 records in
> > > > 10+0 records out
> > > > 10485760 bytes (10 MB) copied, 1.03377 s, 10.1 MB/s
> > > >
> > > > The distributed one shows a bit better performance than the
> > > > distributed-replicated one, but it's still poor. :-(
> > > >
> > > > The disk storage itself is OK, here's what I see on each of 4
> > GlusterFS
> > > > servers:
> > > > for i in {1..5}; do { dd if=/dev/zero of=/mnt/storage1/test.tmp bs=1M
> > count=10 oflag=sync; rm -f /mnt/storage1/test.tmp; } done
> > > > 10+0 records in
> > > > 10+0 records out
> > > > 10485760 bytes (10 MB) copied, 0.0656698 s, 160 MB/s
> > > > 10+0 records in
> > > > 10+0 records out
> > > > 10485760 bytes (10 MB) copied, 0.0476927 s, 220 MB/s
> > > > 10+0 records in
> > > > 10+0 records out
> > > > 10485760 bytes (10 MB) copied, 0.036526 s, 287 MB/s
> > > > 10+0 records in
> > > > 10+0 records out
> > > > 10485760 bytes (10 MB) copied, 0.0329145 s, 319 MB/s
> > > > 10+0 records in
> > > > 10+0 records out
> > > > 10485760 bytes (10 MB) copied, 0.0403988 s, 260 MB/s
> > > >
> > > > The network between all 5 VMs is OK, they all are working on the same
> > > > physical host.
> > > >
> > > > Can't understand, what am I doing wrong. :-(
> > > >
> > > > Here's the detailed info about the volumes:
> > > > Volume Name: storage1
> > > > Type: Distributed-Replicate
> > > > Volume ID: a42e2554-99e5-4331-bcc4-0900d002ae32
> > > > Status: Started
> > > > Snapshot Count: 0
> > > > Number of Bricks: 2 x (2 + 1) = 6
> > > > Transport-type: tcp
> > > > Bricks:
> > > > Brick1: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage1/brick1
> > > > Brick2: gluster2.k8s.maitre-d.tucha.ua:/mnt/storage1/brick2
> > > > Brick3: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage1/brick_arbiter
> > (arbiter)
> > > > Brick4: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage1/brick3
> > > > Brick5: gluster4.k8s.maitre-d.tucha.ua:/mnt/storage1/brick4
> > > > Brick6: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage1/brick_arbiter
> > (arbiter)
> > > > Options Reconfigured:
> > > > transport.address-family: inet
> > > > nfs.disable: on
> > > > performance.client-io-threads: off
> > > >
> > > > Volume Name: storage2
> > > > Type: Distribute
> > > > Volume ID: df4d8096-ad03-493e-9e0e-586ce21fb067
> > > > Status: Started
> > > > Snapshot Count: 0
> > > > Number of Bricks: 4
> > > > Transport-type: tcp
> > > > Bricks:
> > > > Brick1: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage2
> > > > Brick2: gluster2.k8s.maitre-d.tucha.ua:/mnt/storage2
> > > > Brick3: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage2
> > > > Brick4: gluster4.k8s.maitre-d.tucha.ua:/mnt/storage2
> > > > Options Reconfigured:
> > > > transport.address-family: inet
> > > > nfs.disable: on
> > > >
> > > > The OS is CentOS Linux release 7.6.1810. The packages I'm using are:
> > > > glusterfs-6.3-1.el7.x86_64
> > > > glusterfs-api-6.3-1.el7.x86_64
> > > > glusterfs-cli-6.3-1.el7.x86_64
> > > > glusterfs-client-xlators-6.3-1.el7.x86_64
> > > > glusterfs-fuse-6.3-1.el7.x86_64
> > > > glusterfs-libs-6.3-1.el7.x86_64
> > > > glusterfs-server-6.3-1.el7.x86_64
> > > > kernel-3.10.0-327.el7.x86_64
> > > > kernel-3.10.0-514.2.2.el7.x86_64
> > > > kernel-3.10.0-957.12.1.el7.x86_64
> > > > kernel-3.10.0-957.12.2.el7.x86_64
> > > > kernel-3.10.0-957.21.3.el7.x86_64
> > > > kernel-tools-3.10.0-957.21.3.el7.x86_64
> > > > kernel-tools-libs-3.10.0-957.21.3.el7.x86_6
> > > >
> > > > Please, be so kind as to help me to understand, did I do it wrong or
> > > > that's quite normal performance of GlusterFS?
> > > >
> > > > Thanks in advance!
> > > > _______________________________________________
> > > > Gluster-users mailing list
> > > > Gluster-users at gluster.org
> > > > https://lists.gluster.org/mailman/listinfo/gluster-users
> >
> > --
> > V.Melnik
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> >

-- 
V.Melnik