[Gluster-users] Extremely low performance - am I doing somethingwrong?
Dmitry Filonov
filonov at hkl.hms.harvard.edu
Wed Jul 3 19:16:45 UTC 2019
Well, if your network is limited to 100MB/s then it doesn't matter if
storage is capable of doing 300+MB/s.
But 15 MB/s is still way less than 100 MB/s
P.S. just tried on my gluster and found out that am getting ~15MB/s on
replica 3 volume on SSDs and... 2MB/s on replica 3 volume on HDDs.
Something to look at next week.
--
Dmitry Filonov
Linux Administrator
SBGrid Core | Harvard Medical School
250 Longwood Ave, SGM-114
Boston, MA 02115
On Wed, Jul 3, 2019 at 12:18 PM Vladimir Melnik <v.melnik at tucha.ua> wrote:
> Thank you, it helped a little:
>
> $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs1/test.tmp bs=1M
> count=10 oflag=sync; rm -f /mnt/glusterfs1/test.tmp; } done 2>&1 | grep
> copied
> 10485760 bytes (10 MB) copied, 0.738968 s, 14.2 MB/s
> 10485760 bytes (10 MB) copied, 0.725296 s, 14.5 MB/s
> 10485760 bytes (10 MB) copied, 0.681508 s, 15.4 MB/s
> 10485760 bytes (10 MB) copied, 0.85566 s, 12.3 MB/s
> 10485760 bytes (10 MB) copied, 0.661457 s, 15.9 MB/s
>
> But 14-15 MB/s is still quite far from the actual storage's performance
> (200-3000 MB/s). :-(
>
> Here's full configuration dump (just in case):
>
> Option Value
> ------ -----
> cluster.lookup-unhashed on
> cluster.lookup-optimize on
> cluster.min-free-disk 10%
> cluster.min-free-inodes 5%
> cluster.rebalance-stats off
> cluster.subvols-per-directory (null)
> cluster.readdir-optimize off
> cluster.rsync-hash-regex (null)
> cluster.extra-hash-regex (null)
> cluster.dht-xattr-name trusted.glusterfs.dht
> cluster.randomize-hash-range-by-gfid off
> cluster.rebal-throttle normal
> cluster.lock-migration off
> cluster.force-migration off
> cluster.local-volume-name (null)
> cluster.weighted-rebalance on
> cluster.switch-pattern (null)
> cluster.entry-change-log on
> cluster.read-subvolume (null)
> cluster.read-subvolume-index -1
> cluster.read-hash-mode 1
> cluster.background-self-heal-count 8
> cluster.metadata-self-heal off
> cluster.data-self-heal off
> cluster.entry-self-heal off
> cluster.self-heal-daemon on
> cluster.heal-timeout 600
> cluster.self-heal-window-size 1
> cluster.data-change-log on
> cluster.metadata-change-log on
> cluster.data-self-heal-algorithm full
> cluster.eager-lock enable
> disperse.eager-lock on
> disperse.other-eager-lock on
> disperse.eager-lock-timeout 1
> disperse.other-eager-lock-timeout 1
> cluster.quorum-type auto
> cluster.quorum-count (null)
> cluster.choose-local off
> cluster.self-heal-readdir-size 1KB
> cluster.post-op-delay-secs 1
> cluster.ensure-durability on
> cluster.consistent-metadata no
> cluster.heal-wait-queue-length 128
> cluster.favorite-child-policy none
> cluster.full-lock yes
> diagnostics.latency-measurement off
> diagnostics.dump-fd-stats off
> diagnostics.count-fop-hits off
> diagnostics.brick-log-level INFO
> diagnostics.client-log-level INFO
> diagnostics.brick-sys-log-level CRITICAL
> diagnostics.client-sys-log-level CRITICAL
> diagnostics.brick-logger (null)
> diagnostics.client-logger (null)
> diagnostics.brick-log-format (null)
> diagnostics.client-log-format (null)
> diagnostics.brick-log-buf-size 5
> diagnostics.client-log-buf-size 5
> diagnostics.brick-log-flush-timeout 120
> diagnostics.client-log-flush-timeout 120
> diagnostics.stats-dump-interval 0
> diagnostics.fop-sample-interval 0
> diagnostics.stats-dump-format json
> diagnostics.fop-sample-buf-size 65535
> diagnostics.stats-dnscache-ttl-sec 86400
> performance.cache-max-file-size 0
> performance.cache-min-file-size 0
> performance.cache-refresh-timeout 1
> performance.cache-priority
> performance.cache-size 32MB
> performance.io-thread-count 16
> performance.high-prio-threads 16
> performance.normal-prio-threads 16
> performance.low-prio-threads 32
> performance.least-prio-threads 1
> performance.enable-least-priority on
> performance.iot-watchdog-secs (null)
> performance.iot-cleanup-disconnected-reqsoff
> performance.iot-pass-through false
> performance.io-cache-pass-through false
> performance.cache-size 128MB
> performance.qr-cache-timeout 1
> performance.cache-invalidation false
> performance.ctime-invalidation false
> performance.flush-behind on
> performance.nfs.flush-behind on
> performance.write-behind-window-size 1MB
> performance.resync-failed-syncs-after-fsyncoff
> performance.nfs.write-behind-window-size1MB
> performance.strict-o-direct off
> performance.nfs.strict-o-direct off
> performance.strict-write-ordering off
> performance.nfs.strict-write-ordering off
> performance.write-behind-trickling-writeson
> performance.aggregate-size 128KB
> performance.nfs.write-behind-trickling-writeson
> performance.lazy-open yes
> performance.read-after-open yes
> performance.open-behind-pass-through false
> performance.read-ahead-page-count 4
> performance.read-ahead-pass-through false
> performance.readdir-ahead-pass-through false
> performance.md-cache-pass-through false
> performance.md-cache-timeout 1
> performance.cache-swift-metadata true
> performance.cache-samba-metadata false
> performance.cache-capability-xattrs true
> performance.cache-ima-xattrs true
> performance.md-cache-statfs off
> performance.xattr-cache-list
> performance.nl-cache-pass-through false
> features.encryption off
> network.frame-timeout 1800
> network.ping-timeout 42
> network.tcp-window-size (null)
> client.ssl off
> network.remote-dio enable
> client.event-threads 4
> client.tcp-user-timeout 0
> client.keepalive-time 20
> client.keepalive-interval 2
> client.keepalive-count 9
> network.tcp-window-size (null)
> network.inode-lru-limit 16384
> auth.allow *
> auth.reject (null)
> transport.keepalive 1
> server.allow-insecure on
> server.root-squash off
> server.all-squash off
> server.anonuid 65534
> server.anongid 65534
> server.statedump-path /var/run/gluster
> server.outstanding-rpc-limit 64
> server.ssl off
> auth.ssl-allow *
> server.manage-gids off
> server.dynamic-auth on
> client.send-gids on
> server.gid-timeout 300
> server.own-thread (null)
> server.event-threads 4
> server.tcp-user-timeout 42
> server.keepalive-time 20
> server.keepalive-interval 2
> server.keepalive-count 9
> transport.listen-backlog 1024
> transport.address-family inet
> performance.write-behind on
> performance.read-ahead off
> performance.readdir-ahead on
> performance.io-cache off
> performance.open-behind on
> performance.quick-read off
> performance.nl-cache off
> performance.stat-prefetch on
> performance.client-io-threads on
> performance.nfs.write-behind on
> performance.nfs.read-ahead off
> performance.nfs.io-cache off
> performance.nfs.quick-read off
> performance.nfs.stat-prefetch off
> performance.nfs.io-threads off
> performance.force-readdirp true
> performance.cache-invalidation false
> performance.global-cache-invalidation true
> features.uss off
> features.snapshot-directory .snaps
> features.show-snapshot-directory off
> features.tag-namespaces off
> network.compression off
> network.compression.window-size -15
> network.compression.mem-level 8
> network.compression.min-size 0
> network.compression.compression-level -1
> network.compression.debug false
> features.default-soft-limit 80%
> features.soft-timeout 60
> features.hard-timeout 5
> features.alert-time 86400
> features.quota-deem-statfs off
> geo-replication.indexing off
> geo-replication.indexing off
> geo-replication.ignore-pid-check off
> geo-replication.ignore-pid-check off
> features.quota off
> features.inode-quota off
> features.bitrot disable
> debug.trace off
> debug.log-history no
> debug.log-file no
> debug.exclude-ops (null)
> debug.include-ops (null)
> debug.error-gen off
> debug.error-failure (null)
> debug.error-number (null)
> debug.random-failure off
> debug.error-fops (null)
> nfs.disable on
> features.read-only off
> features.worm off
> features.worm-file-level off
> features.worm-files-deletable on
> features.default-retention-period 120
> features.retention-mode relax
> features.auto-commit-period 180
> storage.linux-aio off
> storage.batch-fsync-mode reverse-fsync
> storage.batch-fsync-delay-usec 0
> storage.owner-uid -1
> storage.owner-gid -1
> storage.node-uuid-pathinfo off
> storage.health-check-interval 30
> storage.build-pgfid off
> storage.gfid2path on
> storage.gfid2path-separator :
> storage.reserve 1
> storage.health-check-timeout 10
> storage.fips-mode-rchecksum off
> storage.force-create-mode 0000
> storage.force-directory-mode 0000
> storage.create-mask 0777
> storage.create-directory-mask 0777
> storage.max-hardlinks 100
> features.ctime on
> config.gfproxyd off
> cluster.server-quorum-type server
> cluster.server-quorum-ratio 0
> changelog.changelog off
> changelog.changelog-dir {{ brick.path
> }}/.glusterfs/changelogs
> changelog.encoding ascii
> changelog.rollover-time 15
> changelog.fsync-interval 5
> changelog.changelog-barrier-timeout 120
> changelog.capture-del-path off
> features.barrier disable
> features.barrier-timeout 120
> features.trash off
> features.trash-dir .trashcan
> features.trash-eliminate-path (null)
> features.trash-max-filesize 5MB
> features.trash-internal-op off
> cluster.enable-shared-storage disable
> locks.trace off
> locks.mandatory-locking off
> cluster.disperse-self-heal-daemon enable
> cluster.quorum-reads no
> client.bind-insecure (null)
> features.shard on
> features.shard-block-size 64MB
> features.shard-lru-limit 16384
> features.shard-deletion-rate 100
> features.scrub-throttle lazy
> features.scrub-freq biweekly
> features.scrub false
> features.expiry-time 120
> features.cache-invalidation off
> features.cache-invalidation-timeout 60
> features.leases off
> features.lease-lock-recall-timeout 60
> disperse.background-heals 8
> disperse.heal-wait-qlength 128
> cluster.heal-timeout 600
> dht.force-readdirp on
> disperse.read-policy gfid-hash
> cluster.shd-max-threads 8
> cluster.shd-wait-qlength 10000
> cluster.shd-wait-qlength 10000
> cluster.locking-scheme granular
> cluster.granular-entry-heal no
> features.locks-revocation-secs 0
> features.locks-revocation-clear-all false
> features.locks-revocation-max-blocked 0
> features.locks-monkey-unlocking false
> features.locks-notify-contention no
> features.locks-notify-contention-delay 5
> disperse.shd-max-threads 1
> disperse.shd-wait-qlength 1024
> disperse.cpu-extensions auto
> disperse.self-heal-window-size 1
> cluster.use-compound-fops off
> performance.parallel-readdir off
> performance.rda-request-size 131072
> performance.rda-low-wmark 4096
> performance.rda-high-wmark 128KB
> performance.rda-cache-limit 10MB
> performance.nl-cache-positive-entry false
> performance.nl-cache-limit 10MB
> performance.nl-cache-timeout 60
> cluster.brick-multiplex off
> cluster.max-bricks-per-process 250
> disperse.optimistic-change-log on
> disperse.stripe-cache 4
> cluster.halo-enabled False
> cluster.halo-shd-max-latency 99999
> cluster.halo-nfsd-max-latency 5
> cluster.halo-max-latency 5
> cluster.halo-max-replicas 99999
> cluster.halo-min-replicas 2
> features.selinux on
> cluster.daemon-log-level INFO
> debug.delay-gen off
> delay-gen.delay-percentage 10%
> delay-gen.delay-duration 100000
> delay-gen.enable
> disperse.parallel-writes on
> features.sdfs off
> features.cloudsync off
> features.ctime on
> ctime.noatime on
> feature.cloudsync-storetype (null)
> features.enforce-mandatory-lock off
>
> What do you think, are there any other knobs worth to be turned?
>
> Thanks!
>
> On Wed, Jul 03, 2019 at 06:55:09PM +0300, Strahil wrote:
> > Check the following link (4.1) for the optimal gluster volume settings.
> > They are quite safe.
> >
> > Gluster provides a group called virt (/var/lib/glusterd/groups/virt)
> and can be applied via 'gluster volume set VOLNAME group virt'
> >
> > Then try again.
> >
> > Best Regards,
> > Strahil NikolovOn Jul 3, 2019 11:39, Vladimir Melnik <v.melnik at tucha.ua>
> wrote:
> > >
> > > Dear colleagues,
> > >
> > > I have a lab with a bunch of virtual machines (the virtualization is
> > > provided by KVM) running on the same physical host. 4 of these VMs are
> > > working as a GlusterFS cluster and there's one more VM that works as a
> > > client. I'll specify all the packages' versions in the ending of this
> > > message.
> > >
> > > I created 2 volumes - one is having type "Distributed-Replicate" and
> > > another one is "Distribute". The problem is that both of volumes are
> > > showing really poor performance.
> > >
> > > Here's what I see on the client:
> > > $ mount | grep gluster
> > > 10.13.1.16:storage1 on /mnt/glusterfs1 type
> fuse.glusterfs(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
>
> > > 10.13.1.16:storage2 on /mnt/glusterfs2 type
> fuse.glusterfs(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
>
> > >
> > > $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs1/test.tmp
> bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs1/test.tmp; } done
> > > 10+0 records in
> > > 10+0 records out
> > > 10485760 bytes (10 MB) copied, 1.47936 s, 7.1 MB/s
> > > 10+0 records in
> > > 10+0 records out
> > > 10485760 bytes (10 MB) copied, 1.62546 s, 6.5 MB/s
> > > 10+0 records in
> > > 10+0 records out
> > > 10485760 bytes (10 MB) copied, 1.71229 s, 6.1 MB/s
> > > 10+0 records in
> > > 10+0 records out
> > > 10485760 bytes (10 MB) copied, 1.68607 s, 6.2 MB/s
> > > 10+0 records in
> > > 10+0 records out
> > > 10485760 bytes (10 MB) copied, 1.82204 s, 5.8 MB/s
> > >
> > > $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs2/test.tmp
> bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs2/test.tmp; } done
> > > 10+0 records in
> > > 10+0 records out
> > > 10485760 bytes (10 MB) copied, 1.15739 s, 9.1 MB/s
> > > 10+0 records in
> > > 10+0 records out
> > > 10485760 bytes (10 MB) copied, 0.978528 s, 10.7 MB/s
> > > 10+0 records in
> > > 10+0 records out
> > > 10485760 bytes (10 MB) copied, 0.910642 s, 11.5 MB/s
> > > 10+0 records in
> > > 10+0 records out
> > > 10485760 bytes (10 MB) copied, 0.998249 s, 10.5 MB/s
> > > 10+0 records in
> > > 10+0 records out
> > > 10485760 bytes (10 MB) copied, 1.03377 s, 10.1 MB/s
> > >
> > > The distributed one shows a bit better performance than the
> > > distributed-replicated one, but it's still poor. :-(
> > >
> > > The disk storage itself is OK, here's what I see on each of 4
> GlusterFS
> > > servers:
> > > for i in {1..5}; do { dd if=/dev/zero of=/mnt/storage1/test.tmp bs=1M
> count=10 oflag=sync; rm -f /mnt/storage1/test.tmp; } done
> > > 10+0 records in
> > > 10+0 records out
> > > 10485760 bytes (10 MB) copied, 0.0656698 s, 160 MB/s
> > > 10+0 records in
> > > 10+0 records out
> > > 10485760 bytes (10 MB) copied, 0.0476927 s, 220 MB/s
> > > 10+0 records in
> > > 10+0 records out
> > > 10485760 bytes (10 MB) copied, 0.036526 s, 287 MB/s
> > > 10+0 records in
> > > 10+0 records out
> > > 10485760 bytes (10 MB) copied, 0.0329145 s, 319 MB/s
> > > 10+0 records in
> > > 10+0 records out
> > > 10485760 bytes (10 MB) copied, 0.0403988 s, 260 MB/s
> > >
> > > The network between all 5 VMs is OK, they all are working on the same
> > > physical host.
> > >
> > > Can't understand, what am I doing wrong. :-(
> > >
> > > Here's the detailed info about the volumes:
> > > Volume Name: storage1
> > > Type: Distributed-Replicate
> > > Volume ID: a42e2554-99e5-4331-bcc4-0900d002ae32
> > > Status: Started
> > > Snapshot Count: 0
> > > Number of Bricks: 2 x (2 + 1) = 6
> > > Transport-type: tcp
> > > Bricks:
> > > Brick1: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage1/brick1
> > > Brick2: gluster2.k8s.maitre-d.tucha.ua:/mnt/storage1/brick2
> > > Brick3: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage1/brick_arbiter
> (arbiter)
> > > Brick4: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage1/brick3
> > > Brick5: gluster4.k8s.maitre-d.tucha.ua:/mnt/storage1/brick4
> > > Brick6: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage1/brick_arbiter
> (arbiter)
> > > Options Reconfigured:
> > > transport.address-family: inet
> > > nfs.disable: on
> > > performance.client-io-threads: off
> > >
> > > Volume Name: storage2
> > > Type: Distribute
> > > Volume ID: df4d8096-ad03-493e-9e0e-586ce21fb067
> > > Status: Started
> > > Snapshot Count: 0
> > > Number of Bricks: 4
> > > Transport-type: tcp
> > > Bricks:
> > > Brick1: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage2
> > > Brick2: gluster2.k8s.maitre-d.tucha.ua:/mnt/storage2
> > > Brick3: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage2
> > > Brick4: gluster4.k8s.maitre-d.tucha.ua:/mnt/storage2
> > > Options Reconfigured:
> > > transport.address-family: inet
> > > nfs.disable: on
> > >
> > > The OS is CentOS Linux release 7.6.1810. The packages I'm using are:
> > > glusterfs-6.3-1.el7.x86_64
> > > glusterfs-api-6.3-1.el7.x86_64
> > > glusterfs-cli-6.3-1.el7.x86_64
> > > glusterfs-client-xlators-6.3-1.el7.x86_64
> > > glusterfs-fuse-6.3-1.el7.x86_64
> > > glusterfs-libs-6.3-1.el7.x86_64
> > > glusterfs-server-6.3-1.el7.x86_64
> > > kernel-3.10.0-327.el7.x86_64
> > > kernel-3.10.0-514.2.2.el7.x86_64
> > > kernel-3.10.0-957.12.1.el7.x86_64
> > > kernel-3.10.0-957.12.2.el7.x86_64
> > > kernel-3.10.0-957.21.3.el7.x86_64
> > > kernel-tools-3.10.0-957.21.3.el7.x86_64
> > > kernel-tools-libs-3.10.0-957.21.3.el7.x86_6
> > >
> > > Please, be so kind as to help me to understand, did I do it wrong or
> > > that's quite normal performance of GlusterFS?
> > >
> > > Thanks in advance!
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > https://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
> V.Melnik
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190703/d2da106a/attachment.html>
More information about the Gluster-users
mailing list