<div dir="ltr">Well, if your network is limited to 100MB/s then it doesn't matter if storage is capable of doing 300+MB/s. <div>But 15 MB/s is still way less than 100 MB/s</div><div><br></div><div>P.S. just tried on my gluster and found out that am getting ~15MB/s on replica 3 volume on SSDs and... 2MB/s on replica 3 volume on HDDs. Something to look at next week. </div><div><br></div><div><br clear="all"><div><div dir="ltr" class="m_-2668514580062433158gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br></div><div dir="ltr">--</div><div dir="ltr">Dmitry Filonov<div>Linux Administrator</div><div>SBGrid Core | <span style="font-size:12.8px">Harvard Medical School</span></div><div>250 Longwood Ave, SGM-114</div><div>Boston, MA 02115</div></div></div></div></div></div></div></div></div></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jul 3, 2019 at 12:18 PM Vladimir Melnik <<a href="mailto:v.melnik@tucha.ua" target="_blank">v.melnik@tucha.ua</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Thank you, it helped a little:<br>
<br>
$ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs1/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs1/test.tmp; } done 2>&1 | grep copied<br>
10485760 bytes (10 MB) copied, 0.738968 s, 14.2 MB/s<br>
10485760 bytes (10 MB) copied, 0.725296 s, 14.5 MB/s<br>
10485760 bytes (10 MB) copied, 0.681508 s, 15.4 MB/s<br>
10485760 bytes (10 MB) copied, 0.85566 s, 12.3 MB/s<br>
10485760 bytes (10 MB) copied, 0.661457 s, 15.9 MB/s<br>
<br>
But 14-15 MB/s is still quite far from the actual storage's performance (200-3000 MB/s). :-(<br>
<br>
Here's full configuration dump (just in case):<br>
<br>
Option Value<br>
------ -----<br>
cluster.lookup-unhashed on<br>
cluster.lookup-optimize on<br>
cluster.min-free-disk 10%<br>
cluster.min-free-inodes 5%<br>
cluster.rebalance-stats off<br>
cluster.subvols-per-directory (null)<br>
cluster.readdir-optimize off<br>
cluster.rsync-hash-regex (null)<br>
cluster.extra-hash-regex (null)<br>
cluster.dht-xattr-name trusted.glusterfs.dht<br>
cluster.randomize-hash-range-by-gfid off<br>
cluster.rebal-throttle normal<br>
cluster.lock-migration off<br>
cluster.force-migration off<br>
cluster.local-volume-name (null)<br>
cluster.weighted-rebalance on<br>
cluster.switch-pattern (null)<br>
cluster.entry-change-log on<br>
cluster.read-subvolume (null)<br>
cluster.read-subvolume-index -1<br>
cluster.read-hash-mode 1<br>
cluster.background-self-heal-count 8<br>
cluster.metadata-self-heal off<br>
cluster.data-self-heal off<br>
cluster.entry-self-heal off<br>
cluster.self-heal-daemon on<br>
cluster.heal-timeout 600<br>
cluster.self-heal-window-size 1<br>
cluster.data-change-log on<br>
cluster.metadata-change-log on<br>
cluster.data-self-heal-algorithm full<br>
cluster.eager-lock enable<br>
disperse.eager-lock on<br>
disperse.other-eager-lock on<br>
disperse.eager-lock-timeout 1<br>
disperse.other-eager-lock-timeout 1<br>
cluster.quorum-type auto<br>
cluster.quorum-count (null)<br>
cluster.choose-local off<br>
cluster.self-heal-readdir-size 1KB<br>
cluster.post-op-delay-secs 1<br>
cluster.ensure-durability on<br>
cluster.consistent-metadata no<br>
cluster.heal-wait-queue-length 128<br>
cluster.favorite-child-policy none<br>
cluster.full-lock yes<br>
diagnostics.latency-measurement off<br>
diagnostics.dump-fd-stats off<br>
diagnostics.count-fop-hits off<br>
diagnostics.brick-log-level INFO<br>
diagnostics.client-log-level INFO<br>
diagnostics.brick-sys-log-level CRITICAL<br>
diagnostics.client-sys-log-level CRITICAL<br>
diagnostics.brick-logger (null)<br>
diagnostics.client-logger (null)<br>
diagnostics.brick-log-format (null)<br>
diagnostics.client-log-format (null)<br>
diagnostics.brick-log-buf-size 5<br>
diagnostics.client-log-buf-size 5<br>
diagnostics.brick-log-flush-timeout 120<br>
diagnostics.client-log-flush-timeout 120<br>
diagnostics.stats-dump-interval 0<br>
diagnostics.fop-sample-interval 0<br>
diagnostics.stats-dump-format json<br>
diagnostics.fop-sample-buf-size 65535<br>
diagnostics.stats-dnscache-ttl-sec 86400<br>
performance.cache-max-file-size 0<br>
performance.cache-min-file-size 0<br>
performance.cache-refresh-timeout 1<br>
performance.cache-priority<br>
performance.cache-size 32MB<br>
performance.io-thread-count 16<br>
performance.high-prio-threads 16<br>
performance.normal-prio-threads 16<br>
performance.low-prio-threads 32<br>
performance.least-prio-threads 1<br>
performance.enable-least-priority on<br>
performance.iot-watchdog-secs (null)<br>
performance.iot-cleanup-disconnected-reqsoff<br>
performance.iot-pass-through false<br>
performance.io-cache-pass-through false<br>
performance.cache-size 128MB<br>
performance.qr-cache-timeout 1<br>
performance.cache-invalidation false<br>
performance.ctime-invalidation false<br>
performance.flush-behind on<br>
performance.nfs.flush-behind on<br>
performance.write-behind-window-size 1MB<br>
performance.resync-failed-syncs-after-fsyncoff<br>
performance.nfs.write-behind-window-size1MB<br>
performance.strict-o-direct off<br>
performance.nfs.strict-o-direct off<br>
performance.strict-write-ordering off<br>
performance.nfs.strict-write-ordering off<br>
performance.write-behind-trickling-writeson<br>
performance.aggregate-size 128KB<br>
performance.nfs.write-behind-trickling-writeson<br>
performance.lazy-open yes<br>
performance.read-after-open yes<br>
performance.open-behind-pass-through false<br>
performance.read-ahead-page-count 4<br>
performance.read-ahead-pass-through false<br>
performance.readdir-ahead-pass-through false<br>
performance.md-cache-pass-through false<br>
performance.md-cache-timeout 1<br>
performance.cache-swift-metadata true<br>
performance.cache-samba-metadata false<br>
performance.cache-capability-xattrs true<br>
performance.cache-ima-xattrs true<br>
performance.md-cache-statfs off<br>
performance.xattr-cache-list<br>
performance.nl-cache-pass-through false<br>
features.encryption off<br>
network.frame-timeout 1800<br>
network.ping-timeout 42<br>
network.tcp-window-size (null)<br>
client.ssl off<br>
network.remote-dio enable<br>
client.event-threads 4<br>
client.tcp-user-timeout 0<br>
client.keepalive-time 20<br>
client.keepalive-interval 2<br>
client.keepalive-count 9<br>
network.tcp-window-size (null)<br>
network.inode-lru-limit 16384<br>
auth.allow *<br>
auth.reject (null)<br>
transport.keepalive 1<br>
server.allow-insecure on<br>
server.root-squash off<br>
server.all-squash off<br>
server.anonuid 65534<br>
server.anongid 65534<br>
server.statedump-path /var/run/gluster<br>
server.outstanding-rpc-limit 64<br>
server.ssl off<br>
auth.ssl-allow *<br>
server.manage-gids off<br>
server.dynamic-auth on<br>
client.send-gids on<br>
server.gid-timeout 300<br>
server.own-thread (null)<br>
server.event-threads 4<br>
server.tcp-user-timeout 42<br>
server.keepalive-time 20<br>
server.keepalive-interval 2<br>
server.keepalive-count 9<br>
transport.listen-backlog 1024<br>
transport.address-family inet<br>
performance.write-behind on<br>
performance.read-ahead off<br>
performance.readdir-ahead on<br>
performance.io-cache off<br>
performance.open-behind on<br>
performance.quick-read off<br>
performance.nl-cache off<br>
performance.stat-prefetch on<br>
performance.client-io-threads on<br>
performance.nfs.write-behind on<br>
performance.nfs.read-ahead off<br>
performance.nfs.io-cache off<br>
performance.nfs.quick-read off<br>
performance.nfs.stat-prefetch off<br>
performance.nfs.io-threads off<br>
performance.force-readdirp true<br>
performance.cache-invalidation false<br>
performance.global-cache-invalidation true<br>
features.uss off<br>
features.snapshot-directory .snaps<br>
features.show-snapshot-directory off<br>
features.tag-namespaces off<br>
network.compression off<br>
network.compression.window-size -15<br>
network.compression.mem-level 8<br>
network.compression.min-size 0<br>
network.compression.compression-level -1<br>
network.compression.debug false<br>
features.default-soft-limit 80%<br>
features.soft-timeout 60<br>
features.hard-timeout 5<br>
features.alert-time 86400<br>
features.quota-deem-statfs off<br>
geo-replication.indexing off<br>
geo-replication.indexing off<br>
geo-replication.ignore-pid-check off<br>
geo-replication.ignore-pid-check off<br>
features.quota off<br>
features.inode-quota off<br>
features.bitrot disable<br>
debug.trace off<br>
debug.log-history no<br>
debug.log-file no<br>
debug.exclude-ops (null)<br>
debug.include-ops (null)<br>
debug.error-gen off<br>
debug.error-failure (null)<br>
debug.error-number (null)<br>
debug.random-failure off<br>
debug.error-fops (null)<br>
nfs.disable on<br>
features.read-only off<br>
features.worm off<br>
features.worm-file-level off<br>
features.worm-files-deletable on<br>
features.default-retention-period 120<br>
features.retention-mode relax<br>
features.auto-commit-period 180<br>
storage.linux-aio off<br>
storage.batch-fsync-mode reverse-fsync<br>
storage.batch-fsync-delay-usec 0<br>
storage.owner-uid -1<br>
storage.owner-gid -1<br>
storage.node-uuid-pathinfo off<br>
storage.health-check-interval 30<br>
storage.build-pgfid off<br>
storage.gfid2path on<br>
storage.gfid2path-separator :<br>
storage.reserve 1<br>
storage.health-check-timeout 10<br>
storage.fips-mode-rchecksum off<br>
storage.force-create-mode 0000<br>
storage.force-directory-mode 0000<br>
storage.create-mask 0777<br>
storage.create-directory-mask 0777<br>
storage.max-hardlinks 100<br>
features.ctime on<br>
config.gfproxyd off<br>
cluster.server-quorum-type server<br>
cluster.server-quorum-ratio 0<br>
changelog.changelog off<br>
changelog.changelog-dir {{ brick.path }}/.glusterfs/changelogs<br>
changelog.encoding ascii<br>
changelog.rollover-time 15<br>
changelog.fsync-interval 5<br>
changelog.changelog-barrier-timeout 120<br>
changelog.capture-del-path off<br>
features.barrier disable<br>
features.barrier-timeout 120<br>
features.trash off<br>
features.trash-dir .trashcan<br>
features.trash-eliminate-path (null)<br>
features.trash-max-filesize 5MB<br>
features.trash-internal-op off<br>
cluster.enable-shared-storage disable<br>
locks.trace off<br>
locks.mandatory-locking off<br>
cluster.disperse-self-heal-daemon enable<br>
cluster.quorum-reads no<br>
client.bind-insecure (null)<br>
features.shard on<br>
features.shard-block-size 64MB<br>
features.shard-lru-limit 16384<br>
features.shard-deletion-rate 100<br>
features.scrub-throttle lazy<br>
features.scrub-freq biweekly<br>
features.scrub false<br>
features.expiry-time 120<br>
features.cache-invalidation off<br>
features.cache-invalidation-timeout 60<br>
features.leases off<br>
features.lease-lock-recall-timeout 60<br>
disperse.background-heals 8<br>
disperse.heal-wait-qlength 128<br>
cluster.heal-timeout 600<br>
dht.force-readdirp on<br>
disperse.read-policy gfid-hash<br>
cluster.shd-max-threads 8<br>
cluster.shd-wait-qlength 10000<br>
cluster.shd-wait-qlength 10000<br>
cluster.locking-scheme granular<br>
cluster.granular-entry-heal no<br>
features.locks-revocation-secs 0<br>
features.locks-revocation-clear-all false<br>
features.locks-revocation-max-blocked 0<br>
features.locks-monkey-unlocking false<br>
features.locks-notify-contention no<br>
features.locks-notify-contention-delay 5<br>
disperse.shd-max-threads 1<br>
disperse.shd-wait-qlength 1024<br>
disperse.cpu-extensions auto<br>
disperse.self-heal-window-size 1<br>
cluster.use-compound-fops off<br>
performance.parallel-readdir off<br>
performance.rda-request-size 131072<br>
performance.rda-low-wmark 4096<br>
performance.rda-high-wmark 128KB<br>
performance.rda-cache-limit 10MB<br>
performance.nl-cache-positive-entry false<br>
performance.nl-cache-limit 10MB<br>
performance.nl-cache-timeout 60<br>
cluster.brick-multiplex off<br>
cluster.max-bricks-per-process 250<br>
disperse.optimistic-change-log on<br>
disperse.stripe-cache 4<br>
cluster.halo-enabled False<br>
cluster.halo-shd-max-latency 99999<br>
cluster.halo-nfsd-max-latency 5<br>
cluster.halo-max-latency 5<br>
cluster.halo-max-replicas 99999<br>
cluster.halo-min-replicas 2<br>
features.selinux on<br>
cluster.daemon-log-level INFO<br>
debug.delay-gen off<br>
delay-gen.delay-percentage 10%<br>
delay-gen.delay-duration 100000<br>
delay-gen.enable<br>
disperse.parallel-writes on<br>
features.sdfs off<br>
features.cloudsync off<br>
features.ctime on<br>
ctime.noatime on<br>
feature.cloudsync-storetype (null)<br>
features.enforce-mandatory-lock off<br>
<br>
What do you think, are there any other knobs worth to be turned?<br>
<br>
Thanks!<br>
<br>
On Wed, Jul 03, 2019 at 06:55:09PM +0300, Strahil wrote:<br>
> Check the following link (4.1) for the optimal gluster volume settings.<br>
> They are quite safe.<br>
> <br>
> Gluster provides a group called virt (/var/lib/glusterd/groups/virt) and can be applied via 'gluster volume set VOLNAME group virt'<br>
> <br>
> Then try again.<br>
> <br>
> Best Regards,<br>
> Strahil NikolovOn Jul 3, 2019 11:39, Vladimir Melnik <<a href="mailto:v.melnik@tucha.ua" target="_blank">v.melnik@tucha.ua</a>> wrote:<br>
> ><br>
> > Dear colleagues, <br>
> ><br>
> > I have a lab with a bunch of virtual machines (the virtualization is <br>
> > provided by KVM) running on the same physical host. 4 of these VMs are <br>
> > working as a GlusterFS cluster and there's one more VM that works as a <br>
> > client. I'll specify all the packages' versions in the ending of this <br>
> > message. <br>
> ><br>
> > I created 2 volumes - one is having type "Distributed-Replicate" and <br>
> > another one is "Distribute". The problem is that both of volumes are <br>
> > showing really poor performance. <br>
> ><br>
> > Here's what I see on the client: <br>
> > $ mount | grep gluster <br>
> > 10.13.1.16:storage1 on /mnt/glusterfs1 type fuse.glusterfs(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) <br>
> > 10.13.1.16:storage2 on /mnt/glusterfs2 type fuse.glusterfs(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) <br>
> ><br>
> > $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs1/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs1/test.tmp; } done <br>
> > 10+0 records in <br>
> > 10+0 records out <br>
> > 10485760 bytes (10 MB) copied, 1.47936 s, 7.1 MB/s <br>
> > 10+0 records in <br>
> > 10+0 records out <br>
> > 10485760 bytes (10 MB) copied, 1.62546 s, 6.5 MB/s <br>
> > 10+0 records in <br>
> > 10+0 records out <br>
> > 10485760 bytes (10 MB) copied, 1.71229 s, 6.1 MB/s <br>
> > 10+0 records in <br>
> > 10+0 records out <br>
> > 10485760 bytes (10 MB) copied, 1.68607 s, 6.2 MB/s <br>
> > 10+0 records in <br>
> > 10+0 records out <br>
> > 10485760 bytes (10 MB) copied, 1.82204 s, 5.8 MB/s <br>
> ><br>
> > $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs2/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs2/test.tmp; } done <br>
> > 10+0 records in <br>
> > 10+0 records out <br>
> > 10485760 bytes (10 MB) copied, 1.15739 s, 9.1 MB/s <br>
> > 10+0 records in <br>
> > 10+0 records out <br>
> > 10485760 bytes (10 MB) copied, 0.978528 s, 10.7 MB/s <br>
> > 10+0 records in <br>
> > 10+0 records out <br>
> > 10485760 bytes (10 MB) copied, 0.910642 s, 11.5 MB/s <br>
> > 10+0 records in <br>
> > 10+0 records out <br>
> > 10485760 bytes (10 MB) copied, 0.998249 s, 10.5 MB/s <br>
> > 10+0 records in <br>
> > 10+0 records out <br>
> > 10485760 bytes (10 MB) copied, 1.03377 s, 10.1 MB/s <br>
> ><br>
> > The distributed one shows a bit better performance than the <br>
> > distributed-replicated one, but it's still poor. :-( <br>
> ><br>
> > The disk storage itself is OK, here's what I see on each of 4 GlusterFS <br>
> > servers: <br>
> > for i in {1..5}; do { dd if=/dev/zero of=/mnt/storage1/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/storage1/test.tmp; } done <br>
> > 10+0 records in <br>
> > 10+0 records out <br>
> > 10485760 bytes (10 MB) copied, 0.0656698 s, 160 MB/s <br>
> > 10+0 records in <br>
> > 10+0 records out <br>
> > 10485760 bytes (10 MB) copied, 0.0476927 s, 220 MB/s <br>
> > 10+0 records in <br>
> > 10+0 records out <br>
> > 10485760 bytes (10 MB) copied, 0.036526 s, 287 MB/s <br>
> > 10+0 records in <br>
> > 10+0 records out <br>
> > 10485760 bytes (10 MB) copied, 0.0329145 s, 319 MB/s <br>
> > 10+0 records in <br>
> > 10+0 records out <br>
> > 10485760 bytes (10 MB) copied, 0.0403988 s, 260 MB/s <br>
> ><br>
> > The network between all 5 VMs is OK, they all are working on the same <br>
> > physical host. <br>
> ><br>
> > Can't understand, what am I doing wrong. :-( <br>
> ><br>
> > Here's the detailed info about the volumes: <br>
> > Volume Name: storage1 <br>
> > Type: Distributed-Replicate <br>
> > Volume ID: a42e2554-99e5-4331-bcc4-0900d002ae32 <br>
> > Status: Started <br>
> > Snapshot Count: 0 <br>
> > Number of Bricks: 2 x (2 + 1) = 6 <br>
> > Transport-type: tcp <br>
> > Bricks: <br>
> > Brick1: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage1/brick1 <br>
> > Brick2: gluster2.k8s.maitre-d.tucha.ua:/mnt/storage1/brick2 <br>
> > Brick3: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage1/brick_arbiter (arbiter) <br>
> > Brick4: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage1/brick3 <br>
> > Brick5: gluster4.k8s.maitre-d.tucha.ua:/mnt/storage1/brick4 <br>
> > Brick6: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage1/brick_arbiter (arbiter) <br>
> > Options Reconfigured: <br>
> > transport.address-family: inet <br>
> > nfs.disable: on <br>
> > performance.client-io-threads: off <br>
> ><br>
> > Volume Name: storage2 <br>
> > Type: Distribute <br>
> > Volume ID: df4d8096-ad03-493e-9e0e-586ce21fb067 <br>
> > Status: Started <br>
> > Snapshot Count: 0 <br>
> > Number of Bricks: 4 <br>
> > Transport-type: tcp <br>
> > Bricks: <br>
> > Brick1: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage2 <br>
> > Brick2: gluster2.k8s.maitre-d.tucha.ua:/mnt/storage2 <br>
> > Brick3: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage2 <br>
> > Brick4: gluster4.k8s.maitre-d.tucha.ua:/mnt/storage2 <br>
> > Options Reconfigured: <br>
> > transport.address-family: inet <br>
> > nfs.disable: on <br>
> ><br>
> > The OS is CentOS Linux release 7.6.1810. The packages I'm using are: <br>
> > glusterfs-6.3-1.el7.x86_64 <br>
> > glusterfs-api-6.3-1.el7.x86_64 <br>
> > glusterfs-cli-6.3-1.el7.x86_64 <br>
> > glusterfs-client-xlators-6.3-1.el7.x86_64 <br>
> > glusterfs-fuse-6.3-1.el7.x86_64 <br>
> > glusterfs-libs-6.3-1.el7.x86_64 <br>
> > glusterfs-server-6.3-1.el7.x86_64 <br>
> > kernel-3.10.0-327.el7.x86_64 <br>
> > kernel-3.10.0-514.2.2.el7.x86_64 <br>
> > kernel-3.10.0-957.12.1.el7.x86_64 <br>
> > kernel-3.10.0-957.12.2.el7.x86_64 <br>
> > kernel-3.10.0-957.21.3.el7.x86_64 <br>
> > kernel-tools-3.10.0-957.21.3.el7.x86_64 <br>
> > kernel-tools-libs-3.10.0-957.21.3.el7.x86_6 <br>
> ><br>
> > Please, be so kind as to help me to understand, did I do it wrong or <br>
> > that's quite normal performance of GlusterFS? <br>
> ><br>
> > Thanks in advance! <br>
> > _______________________________________________ <br>
> > Gluster-users mailing list <br>
> > <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a> <br>
> > <a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a> <br>
<br>
-- <br>
V.Melnik<br>
_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
</blockquote></div>