[Gluster-users] GFS performance under heavy traffic

Tue Jan 7 23:03:47 UTC 2020

Hi Strahil,

Thanks for that. The queue/scheduler file for the relevant disk reports
"noop [deadline] cfq", so deadline is being used. It is using ext4, and
I've verified that the MTU is 1500.

We could change the filesystem from ext4 to xfs, but in this case we're not
looking to tinker around the edges and get a small performance improvement
- we need a very large improvement on the 114MBps of network traffic to
make it usable.

I think what we really need to do first is to reproduce the problem in
testing, and then come back to possible solutions.

On Tue, 7 Jan 2020 at 22:15, Strahil Nikolov <hunter86_bg at yahoo.com> wrote:

> To find the scheduler , find all pvs of the LV is providing your storage
>
> [root at ovirt1 ~]# df -Th /gluster_bricks/data_fast
> Filesystem                                       Type  Size  Used Avail
> Use% Mounted on
> /dev/mapper/gluster_vg_nvme-gluster_lv_data_fast xfs   100G   39G   62G
> 39% /gluster_bricks/data_fast
>
>
> [root at ovirt1 ~]# pvs | grep gluster_vg_nvme
>   /dev/mapper/vdo_nvme gluster_vg_nvme lvm2 a--  <1024.00g    0
>
> [root at ovirt1 ~]# cat /etc/vdoconf.yml
> ####################################################################
> # THIS FILE IS MACHINE GENERATED. DO NOT EDIT THIS FILE BY HAND.
> ####################################################################
> config: !Configuration
>   vdos:
>    vdo_nvme: !VDOService
>       device: /dev/disk/by-id/nvme-ADATA_SX8200PNP_2J1120011596
>
>
> [root at ovirt1 ~]# ll /dev/disk/by-id/nvme-ADATA_SX8200PNP_2J1120011596
> lrwxrwxrwx. 1 root root 13 Dec 17 20:21
> /dev/disk/by-id/nvme-ADATA_SX8200PNP_2J1120011596 -> ../../nvme0n1
> [root at ovirt1 ~]# cat /sys/block/nvme0n1/queue/scheduler
> [none] mq-deadline kyber
>
> Note: If device is under multipath , you need to check all paths (you can
> get them from 'multipath -ll' command).
> The only scheduler you should avoid is "cfq" which was default for RHEL 6
> & SLES 11.
>
> XFS has better performance that ext-based systems.
>
> Another tuning is to use Red hat's tuned profiles for gluster. You can
> extract them from (or newer if you find)
> ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.4.2.0-1.el7rhgs.src.rpm
>
>
> About MTU - it's reducing the ammount of packages that the kernel has to
> process - but requires infrastructure to support that too. You can test by
> setting MTU on both sides to 9000 and then run 'tracepath remote-ip'. Also
> run a ping with large size without do not fragment flag ->  'ping -M do
> -s 8900 <destination-ip>' If ping comes back - you are good to go.
>
>
> Best Regards,
> Strahil Nikolov
>
> В вторник, 7 януари 2020 г., 3:00:23 ч. Гринуич-5, David Cunningham <
> dcunningham at voisonics.com> написа:
>
>
> Hi Strahil,
>
> I believe we are using the standard MTU of 1500 (would need to check with
> the network people to be sure). Does it make a difference?
>
> I'm afraid I don't know about the scheduler - where do I find that?
>
> Thank you for the suggestions about turning off performance.read-ahead and
> performance.readdir-ahead.
>
>
> On Tue, 7 Jan 2020 at 18:08, Strahil <hunter86_bg at yahoo.com> wrote:
>
> Hi David,
>
> It's difficult to find anything structured (but it's the same for Linux
> and other  tech). I use Red Hat's doxumentation, guideds online (crosscheck
> the options with official documentation) and experience shared on the
> mailing list.
>
> I don't see anything (iin /var/lib/gluster/groups) that will match your
> profile, but I think that you should try with performance.read-ahead  and
> performance.readdir-ahead 'off' . I have found out a bug (didn't read  the
> whole stuff) ,  that might be interesting for you :
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1601166
>
> Also, Arbiter is very important in order to avoid split brain situations
> (but based on my experience , issues still can occur) and best the brick
> for the Arbiter to be an SSD as it needs to process the metadata as fast as
> possible. With v7, there  is an option the client to have an Arbiter even
> in the cloud (remote arbiter) that is used only when 1 data brick is down.
>
> Please report the issue with the cache  - that should not be like that.
>
> Are you using Jumbo frames  (MTU 9000)?
> What is yoir brick's  I/O scheduler  ?
>
> Best Regards,
> Strahil Nikolov
> On Jan 7, 2020 01:34, David Cunningham <dcunningham at voisonics.com> wrote:
>
> Hi Strahil,
>
> We may have had a heal since the GFS arbiter node wasn't accessible from
> the GFS clients, only from the other GFS servers. Unfortunately we haven't
> been able to produce the problem seen in production while testing so are
> unsure whether making the GFS arbiter node directly available to clients
> has fixed the issue.
>
> The load on GFS is mainly:
> 1. There are a small number of files around 5MB in size which are read
> often and change infrequently.
> 2. There are a large number of directories which are opened for reading to
> read the list of contents frequently.
> 3. There are a large number of new files around 5MB in size written
> frequently and read infrequently.
>
> We haven't touched the tuning options as we don't really feel qualified to
> tell what needs changed from the default. Do you know of any suitable
> guides to get started?
>
> For some reason performance.cache-size is reported as both 32MB and 128MB.
> Is it worth reporting even for version 5.6?
>
> Here is the "gluster volume info" taken on the first node. Note that the
> third node (the arbiter) is currently taken out of the cluster:
> Volume Name: gvol0
> Type: Replicate
> Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: gfs1:/nodirectwritedata/gluster/gvol0
> Brick2: gfs2:/nodirectwritedata/gluster/gvol0
> Options Reconfigured:
> diagnostics.client-log-level: INFO
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
>
> Thanks for your help and advice.
>
>
> On Sat, 28 Dec 2019 at 17:46, Strahil <hunter86_bg at yahoo.com> wrote:
>
> Hi David,
>
> It seems that I have misread your quorum options, so just ignore that from
> my previous e-mail.
>
> Best Regards,
> Strahil Nikolov
> On Dec 27, 2019 15:38, Strahil <hunter86_bg at yahoo.com> wrote:
>
> Hi David,
>
> Gluster supports live rolling upgrade, so there is no need to redeploy at
> all - but the migration notes should be checked as some features must be
> disabled first.
> Also, the gluster client should remount in order to bump the gluster
> op-version.
>
> What kind of workload do you have ?
> I'm asking as there  are predefined (and recommended) settings located at
> /var/lib/gluster/groups .
> You can check the options for each group and cross-check the options
> meaning in the docs before  activating a setting.
>
> I still have a vague feeling  that ,during that high-peak of network
> bandwidth, there was  a  heal  going on. Have you checked that ?
>
> Also, sharding is very useful , when you work with large files and the
> heal is reduced to the size of the shard.
>
> N.B.: Once sharding is enabled, DO NOT DISABLE it - as you will loose
> your data.
>
> Using GLUSTER v7.1 (soon on CentOS  & Debian) allows using latest
> features  and optimizations while support from gluster Dev community is
> quite active.
>
> P.S: I'm wondering how 'performance.cache-size' can both be 32 MB and 128
> MB. Please double-check this (maybe I'm reading it wrong on my smartphone)
> and if needed raise a bug on bugzilla.redhat.com
>
> P.S2: Please  provide  'gluster volume info' as 'cluster.quorum-type' ->
> 'none' is not normal for replicated volumes (arbiters are using in replica
> volumes)
>
> According to the dooutput (otps://
> docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/)
> :
>
> *Note:** Enabling the arbiter feature **automatically** configures* *client-quorum
> to 'auto'. This setting is **not** to be changed.*
>
> Here is my output (Hyperconverged Virtualization Cluster -> oVirt):
> # gluster volume info engine |  grep quorum
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
>
> Changing quorum is more 'riskier' than other options, so you need to take
> necessary measures.  I think , we all  know what will happen , if the
> cluster is out of quorum and you change the quorum settings to more
> stringent ones :D
>
> P.S3: If you decide to reset  your gluster volume to the defaults, you can
> create a new volume (same type as current one), the  get the options for
> that volume and put them in a file and then bulk deploy via 'gluster volume
> set <Original Volume>   group custom-group' ,  where  the file is located
> on every gluster  server in the '/var/lib/gluster/groups' directory.
> Last ,  get rid of the sample volume.
>
> Best Regards,
> Strahil Nikolov
> On Dec 27, 2019 03:22, David Cunningham <dcunningham at voisonics.com> wrote:
>
> Hi Strahil,
>
> Our volume options are as below. Thanks for the suggestion to upgrade to
> version 6 or 7. We could do that be simply removing the current
> installation and installing the new one (since it's not live right now). We
> might have to convince the customer that it's likely to succeed though, as
> at the moment I think they believe that GFS is not going to work for them.
>
> Option                                  Value
>
> ------                                  -----
>
> cluster.lookup-unhashed                 on
>
> cluster.lookup-optimize                 on
>
> cluster.min-free-disk                   10%
>
> cluster.min-free-inodes                 5%
>
> cluster.rebalance-stats                 off
>
> cluster.subvols-per-directory           (null)
>
> cluster.readdir-optimize                off
>
> cluster.rsync-hash-regex                (null)
>
> cluster.extra-hash-regex                (null)
>
> cluster.dht-xattr-name                  trusted.glusterfs.dht
>
> cluster.randomize-hash-range-by-gfid    off
>
> cluster.rebal-throttle                  normal
>
> cluster.lock-migration                  off
>
> cluster.force-migration                 off
>
> cluster.local-volume-name               (null)
>
> cluster.weighted-rebalance              on
>
> cluster.switch-pattern                  (null)
>
> cluster.entry-change-log                on
>
> cluster.read-subvolume                  (null)
>
> cluster.read-subvolume-index            -1
>
> cluster.read-hash-mode                  1
>
> cluster.background-self-heal-count      8
>
> cluster.metadata-self-heal              on
>
> cluster.data-self-heal                  on
>
> cluster.entry-self-heal                 on
>
> cluster.self-heal-daemon                on
>
> cluster.heal-timeout                    600
>
> cluster.self-heal-window-size           1
>
> cluster.data-change-log                 on
>
> cluster.metadata-change-log             on
>
> cluster.data-self-heal-algorithm        (null)
>
> cluster.eager-lock                      on
>
> disperse.eager-lock                     on
>
> disperse.other-eager-lock               on
>
> disperse.eager-lock-timeout             1
>
> disperse.other-eager-lock-timeout       1
>
> cluster.quorum-type                     none
>
> cluster.quorum-count                    (null)
>
> cluster.choose-local                    true
>
> cluster.self-heal-readdir-size          1KB
>
> cluster.post-op-delay-secs              1
>
> cluster.ensure-durability               on
>
> cluster.consistent-metadata             no
>
> cluster.heal-wait-queue-length          128
>
> cluster.favorite-child-policy           none
>
> cluster.full-lock                       yes
>
> cluster.stripe-block-size               128KB
>
> cluster.stripe-coalesce                 true
>
> diagnostics.latency-measurement         off
>
> diagnostics.dump-fd-stats               off
>
> diagnostics.count-fop-hits              off
>
> diagnostics.brick-log-level             INFO
>
> diagnostics.client-log-level            INFO
>
> diagnostics.brick-sys-log-level         CRITICAL
>
> diagnostics.client-sys-log-level        CRITICAL
>
> diagnostics.brick-logger                (null)
>
> diagnostics.client-logger               (null)
>
> diagnostics.brick-log-format            (null)
>
> diagnostics.client-log-format           (null)
>
> diagnostics.brick-log-buf-size          5
>
> diagnostics.client-log-buf-size         5
>
> diagnostics.brick-log-flush-timeout     120
>
> diagnostics.client-log-flush-timeout    120
>
> diagnostics.stats-dump-interval         0
>
> diagnostics.fop-sample-interval         0
>
> diagnostics.stats-dump-format           json
>
> diagnostics.fop-sample-buf-size         65535
>
> diagnostics.stats-dnscache-ttl-sec      86400
>
> performance.cache-max-file-size         0
>
> performance.cache-min-file-size         0
>
> performance.cache-refresh-timeout       1
>
> performance.cache-priority
>
> performance.cache-size                  32MB
>
> performance.io-thread-count             16
>
> performance.high-prio-threads           16
>
> performance.normal-prio-threads         16
>
> performance.low-prio-threads            16
>
> performance.least-prio-threads          1
>
> performance.enable-least-priority       on
>
> performance.iot-watchdog-secs           (null)
>
> performance.iot-cleanup-disconnected-reqsoff
>
> performance.iot-pass-through            false
>
> performance.io-cache-pass-through       false
>
> performance.cache-size                  128MB
>
> performance.qr-cache-timeout            1
>
> performance.cache-invalidation          false
>
> performance.ctime-invalidation          false
>
> performance.flush-behind                on
>
> performance.nfs.flush-behind            on
>
> performance.write-behind-window-size    1MB
>
> performance.resync-failed-syncs-after-fsyncoff
>
> performance.nfs.write-behind-window-size1MB
>
> performance.strict-o-direct             off
>
> performance.nfs.strict-o-direct         off
>
> performance.strict-write-ordering       off
>
> performance.nfs.strict-write-ordering   off
>
> performance.write-behind-trickling-writeson
>
> performance.aggregate-size              128KB
>
> performance.nfs.write-behind-trickling-writeson
>
> performance.lazy-open                   yes
>
> performance.read-after-open             yes
>
> performance.open-behind-pass-through    false
>
> performance.read-ahead-page-count       4
>
> performance.read-ahead-pass-through     false
>
> performance.readdir-ahead-pass-through  false
>
> performance.md-cache-pass-through       false
>
> performance.md-cache-timeout            1
>
> performance.cache-swift-metadata        true
>
> performance.cache-samba-metadata        false
>
> performance.cache-capability-xattrs     true
>
> performance.cache-ima-xattrs            true
>
> performance.md-cache-statfs             off
>
> performance.xattr-cache-list
>
> performance.nl-cache-pass-through       false
>
> features.encryption                     off
>
> encryption.master-key                   (null)
>
> encryption.data-key-size                256
>
> encryption.block-size                   4096
>
> network.frame-timeout                   1800
>
> network.ping-timeout                    42
>
> network.tcp-window-size                 (null)
>
> network.remote-dio                      disable
>
> client.event-threads                    2
>
> client.tcp-user-timeout                 0
>
> client.keepalive-time                   20
>
> client.keepalive-interval               2
>
> client.keepalive-count                  9
>
> network.tcp-window-size                 (null)
>
> network.inode-lru-limit                 16384
>
> auth.allow                              *
>
> auth.reject                             (null)
>
> transport.keepalive                     1
>
> server.allow-insecure                   on
>
> server.root-squash                      off
>
> server.anonuid                          65534
>
> server.anongid                          65534
>
> server.statedump-path                   /var/run/gluster
>
> server.outstanding-rpc-limit            64
>
> server.ssl                              (null)
>
> auth.ssl-allow                          *
>
> server.manage-gids                      off
>
> server.dynamic-auth                     on
>
> client.send-gids                        on
>
> server.gid-timeout                      300
>
> server.own-thread                       (null)
>
> server.event-threads                    1
>
> server.tcp-user-timeout                 0
>
> server.keepalive-time                   20
>
> server.keepalive-interval               2
>
> server.keepalive-count                  9
>
> transport.listen-backlog                1024
>
> ssl.own-cert                            (null)
>
> ssl.private-key                         (null)
>
> ssl.ca-list                             (null)
>
> ssl.crl-path                            (null)
>
> ssl.certificate-depth                   (null)
>
> ssl.cipher-list                         (null)
>
> ssl.dh-param                            (null)
>
> ssl.ec-curve                            (null)
>
> transport.address-family                inet
>
> performance.write-behind                on
>
> performance.read-ahead                  on
>
> performance.readdir-ahead               on
>
> performance.io-cache                    on
>
> performance.quick-read                  on
>
> performance.open-behind                 on
>
> performance.nl-cache                    off
>
> performance.stat-prefetch               on
>
> performance.client-io-threads           off
>
> performance.nfs.write-behind            on
>
> performance.nfs.read-ahead              off
>
> performance.nfs.io-cache                off
>
> performance.nfs.quick-read              off
>
> performance.nfs.stat-prefetch           off
>
> performance.nfs.io-threads              off
>
> performance.force-readdirp              true
>
> performance.cache-invalidation          false
>
> features.uss                            off
>
> features.snapshot-directory             .snaps
>
> features.show-snapshot-directory        off
>
> features.tag-namespaces                 off
>
> network.compression                     off
>
> network.compression.window-size         -15
>
> network.compression.mem-level           8
>
> network.compression.min-size            0
>
> network.compression.compression-level   -1
>
> network.compression.debug               false
>
> features.default-soft-limit             80%
>
> features.soft-timeout                   60
>
> features.hard-timeout                   5
>
> features.alert-time                     86400
>
> features.quota-deem-statfs              off
>
> geo-replication.indexing                off
>
> geo-replication.indexing                off
>
> geo-replication.ignore-pid-check        off
>
> geo-replication.ignore-pid-check        off
>
> features.quota                          off
>
> features.inode-quota                    off
>
> features.bitrot                         disable
>
> debug.trace                             off
>
> debug.log-history                       no
>
> debug.log-file                          no
>
> debug.exclude-ops                       (null)
>
> debug.include-ops                       (null)
>
> debug.error-gen                         off
>
> debug.error-failure                     (null)
>
> debug.error-number                      (null)
>
> debug.random-failure                    off
>
> debug.error-fops                        (null)
>
> nfs.disable                             on
>
> features.read-only                      off
>
> features.worm                           off
>
> features.worm-file-level                off
>
> features.worm-files-deletable           on
>
> features.default-retention-period       120
>
> features.retention-mode                 relax
>
> features.auto-commit-period             180
>
> storage.linux-aio                       off
>
> storage.batch-fsync-mode                reverse-fsync
>
> storage.batch-fsync-delay-usec          0
>
> storage.owner-uid                       -1
>
> storage.owner-gid                       -1
>
> storage.node-uuid-pathinfo              off
>
> storage.health-check-interval           30
>
> storage.build-pgfid                     off
>
> storage.gfid2path                       on
>
> storage.gfid2path-separator             :
>
> storage.reserve                         1
>
> storage.health-check-timeout            10
>
> storage.fips-mode-rchecksum             off
>
> storage.force-create-mode               0000
>
> storage.force-directory-mode            0000
>
> storage.create-mask                     0777
>
> storage.create-directory-mask           0777
>
> storage.max-hardlinks                   100
>
> storage.ctime                           off
>
> storage.bd-aio                          off
>
> config.gfproxyd                         off
>
> cluster.server-quorum-type              off
>
> cluster.server-quorum-ratio             0
>
> changelog.changelog                     off
>
> changelog.changelog-dir                 {{ brick.path
> }}/.glusterfs/changelogs
> changelog.encoding                      ascii
>
> changelog.rollover-time                 15
>
> changelog.fsync-interval                5
>
> changelog.changelog-barrier-timeout     120
>
> changelog.capture-del-path              off
>
> features.barrier                        disable
>
> features.barrier-timeout                120
>
> features.trash                          off
>
> features.trash-dir                      .trashcan
>
> features.trash-eliminate-path           (null)
>
> features.trash-max-filesize             5MB
>
> features.trash-internal-op              off
>
> cluster.enable-shared-storage           disable
>
> cluster.write-freq-threshold            0
>
> cluster.read-freq-threshold             0
>
> cluster.tier-pause                      off
>
> cluster.tier-promote-frequency          120
>
> cluster.tier-demote-frequency           3600
>
> cluster.watermark-hi                    90
>
> cluster.watermark-low                   75
>
> cluster.tier-mode                       cache
>
> cluster.tier-max-promote-file-size      0
>
> cluster.tier-max-mb                     4000
>
> cluster.tier-max-files                  10000
>
> cluster.tier-query-limit                100
>
> cluster.tier-compact                    on
>
> cluster.tier-hot-compact-frequency      604800
>
> cluster.tier-cold-compact-frequency     604800
>
> features.ctr-enabled                    off
>
> features.record-counters                off
>
> features.ctr-record-metadata-heat       off
>
> features.ctr_link_consistency           off
>
> features.ctr_lookupheal_link_timeout    300
>
> features.ctr_lookupheal_inode_timeout   300
>
> features.ctr-sql-db-cachesize           12500
>
> features.ctr-sql-db-wal-autocheckpoint  25000
>
> features.selinux                        on
>
> locks.trace                             off
>
> locks.mandatory-locking                 off
>
> cluster.disperse-self-heal-daemon       enable
>
> cluster.quorum-reads                    no
>
> client.bind-insecure                    (null)
>
> features.shard                          off
>
> features.shard-block-size               64MB
>
> features.shard-lru-limit                16384
>
> features.shard-deletion-rate            100
>
> features.scrub-throttle                 lazy
>
> features.scrub-freq                     biweekly
>
> features.scrub                          false
>
> features.expiry-time                    120
>
> features.cache-invalidation             off
>
> features.cache-invalidation-timeout     60
>
> features.leases                         off
>
> features.lease-lock-recall-timeout      60
>
> disperse.background-heals               8
>
> disperse.heal-wait-qlength              128
>
> cluster.heal-timeout                    600
>
> dht.force-readdirp                      on
>
> disperse.read-policy                    gfid-hash
>
> cluster.shd-max-threads                 1
>
> cluster.shd-wait-qlength                1024
>
> cluster.locking-scheme                  full
>
> cluster.granular-entry-heal             no
>
> features.locks-revocation-secs          0
>
> features.locks-revocation-clear-all     false
>
> features.locks-revocation-max-blocked   0
>
> features.locks-monkey-unlocking         false
>
> features.locks-notify-contention        no
>
> features.locks-notify-contention-delay  5
>
> disperse.shd-max-threads                1
>
> disperse.shd-wait-qlength               1024
>
> disperse.cpu-extensions                 auto
>
> disperse.self-heal-window-size          1
>
> cluster.use-compound-fops               off
>
> performance.parallel-readdir            off
>
> performance.rda-request-size            131072
>
> performance.rda-low-wmark               4096
>
> performance.rda-high-wmark              128KB
>
> performance.rda-cache-limit             10MB
>
> performance.nl-cache-positive-entry     false
>
> performance.nl-cache-limit              10MB
>
> performance.nl-cache-timeout            60
>
> cluster.brick-multiplex                 off
>
> cluster.max-bricks-per-process          0
>
> disperse.optimistic-change-log          on
>
> disperse.stripe-cache                   4
>
> cluster.halo-enabled                    False
>
> cluster.halo-shd-max-latency            99999
>
> cluster.halo-nfsd-max-latency           5
>
> cluster.halo-max-latency                5
>
> cluster.halo-max-replicas               99999
>
> cluster.halo-min-replicas               2
>
> cluster.daemon-log-level                INFO
>
> debug.delay-gen                         off
>
> delay-gen.delay-percentage              10%
>
> delay-gen.delay-duration                100000
>
> delay-gen.enable
>
> disperse.parallel-writes                on
>
> features.sdfs                           on
>
> features.cloudsync                      off
>
> features.utime                          off
>
> ctime.noatime                           on
>
> feature.cloudsync-storetype             (null)
>
>
> Thanks again.
>
>
> On Wed, 25 Dec 2019 at 05:51, Strahil <hunter86_bg at yahoo.com> wrote:
>
> Hi David,
>
> On Dec 24, 2019 02:47, David Cunningham <dcunningham at voisonics.com> wrote:
> >
> > Hello,
> >
> > In testing we found that actually the GFS client having access to all 3
> nodes made no difference to performance. Perhaps that's because the 3rd
> node that wasn't accessible from the client before was the arbiter node?
> It makes sense, as no data is being generated towards the arbiter.
> > Presumably we shouldn't have an arbiter node listed under
> backupvolfile-server when mounting the filesystem? Since it doesn't store
> all the data surely it can't be used to serve the data.
>
> I have my arbiter defined as last backup and no issues so far. At least
> the admin can easily identify the bricks from the mount options.
>
> > We did have direct-io-mode=disable already as well, so that wasn't a
> factor in the performance problems.
>
> Have you checked if the client vedsion ia not too old.
> Also you can check the cluster's  operation cersion:
> # gluster volume get all cluster.max-op-version
> # gluster volume get all cluster.op-version
>
> Cluster's op version should be at max-op-version.
>
> In my mind come 2  options:
> A) Upgrade to latest GLUSTER v6 or even v7 ( I know it won't be easy) and
> then set the op version to highest possible.
> # gluster volume get all cluster.max-op-version
> # gluster volume get all cluster.op-version
>
> B)  Deploy a NFS Ganesha server and connect the client over NFS v4.2 (and
> control the parallel connections from Ganesha).
>
> Can you provide your  Gluster volume's  options?
> 'gluster volume get <VOLNAME>  all'
>
> > Thanks again for any advice.
> >
> >
> >
> > On Mon, 23 Dec 2019 at 13:09, David Cunningham <
> dcunningham at voisonics.com> wrote:
> >>
> >> Hi Strahil,
> >>
> >> Thanks for that. We do have one backup server specified, but will add
> the second backup as well.
> >>
> >>
> >> On Sat, 21 Dec 2019 at 11:26, Strahil <hunter86_bg at yahoo.com> wrote:
> >>>
> >>> Hi David,
> >>>
> >>> Also consider using the  mount option to specify backup server via
> 'backupvolfile-server=server2:server3' (you can define more but I don't
> thing replica volumes  greater that 3 are usefull (maybe  in some special
> cases).
> >>>
> >>> In such way, when the primary is lost, your client can reach a backup
> one without disruption.
> >>>
> >>> P.S.: Client may 'hang' - if the primary server got rebooted
> ungracefully - as the communication must timeout before FUSE addresses the
> next server. There is a special script for  killing gluster processes in
> '/usr/share/gluster/scripts' which can be used  for  setting up a systemd
> service to do that for you on shutdown.
> >>>
> >>> Best Regards,
> >>> Strahil Nikolov
> >>>
> >>> On Dec 20, 2019 23:49, David Cunningham <dcunningham at voisonics.com>
> wrote:
> >>>>
> >>>> Hi Stahil,
> >>>>
> >>>> Ah, that is an important point. One of the nodes is not accessible
> from the client, and we assumed that it only needed to reach the GFS node
> that was mounted so didn't think anything of it.
> >>>>
> >>>> We will try making all nodes accessible, as well as
> "direct-io-mode=disable".
> >>>>
> >>>> Thank you.
> >>>>
> >>>>
> >>>> On Sat, 21 Dec 2019 at 10:29, Strahil Nikolov <hunter86_bg at yahoo.com>
> wrote:
> >>>>>
> >>>>> Actually I haven't clarified myself.
> >>>>> FUSE mounts on the client side is connecting directly to all bricks
> consisted of the volume.
> >>>>> If for some reason (bad routing, firewall blocked) there could be
> cases where the client can reach 2 out of 3 bricks and this can constantly
> cause healing to happen (as one of the bricks is never updated) which will
> degrade the performance and cause excessive network usage.
> >>>>> As your attachment is from one of the gluster nodes, this could be
> the case.
> >>>>>
> >>>>> Best Regards,
> >>>>> Strahil Nikolov
> >>>>>
> >>>>> В петък, 20 декември 2019 г., 01:49:56 ч. Гринуич+2, David
> Cunningham <dcunningham at voisonics.com> написа:
> >>>>>
> >>>>>
> >>>>> Hi Strahil,
> >>>>>
> >>>>> The chart attached to my original email is taken from the GFS server.
> >>>>>
> >>>>> I'm not sure what you mean by accessing all bricks simultaneously.
> We've mounted it from the client like this:
> >>>>> gfs1:/gvol0 /mnt/glusterfs/ glusterfs
> defaults,direct-io-mode=disable,_netdev,backupvolfile-server=gfs2,fetch-attempts=10
> 0 0
> >>>>>
> >>>>> Should we do something different to access all bricks simultaneously?
> >>>>>
> >>>>> Thanks for your help!
> >>>>>
> >>>>>
> >>>>> On Fri, 20 Dec 2019 at 11:47, Strahil Nikolov <hunter86_bg at yahoo.com>
> wrote:
> >>>>>>
> >>>>>> I'm not sure if you did measure the traffic from client side
> (tcpdump on a client machine) or from Server side.
> >>>>>>
> >>>>>> In both cases , please verify that the client accesses all bricks
> simultaneously, as this can cause unnecessary heals.
> >>>>>>
> >>>>>> Have you thought about upgrading to v6? There are some enhancements
> in v6 which could be beneficial.
> >>>>>>
> >>>>>> Yet, it is indeed strange that so much traffic is generated with
> FUSE.
> >>>>>>
> >>>>>> Another aproach is to test with NFSGanesha which suports pNFS and
> can natively speak with Gluster, which cant bring you closer to the
> previous setup and also provide some extra performance.
> >>>>>>
> >>>>>>
> >>>>>> Best Regards,
> >>>>>> Strahil Nikolov
> >>>>>>
> >>>>>>
> >>>>>>
> >>
> >>
> >> --
> >> David Cunningham, Voisonics Limited
> >> http://voisonics.com/
> >> USA: +1 213 221 1092
> >> New Zealand: +64 (0)28 2558 3782
> >
> >
> >
> > --
> > David Cunningham, Voisonics Limited
> > http://voisonics.com/
> > USA: +1 213 221 1092
> > New Zealand: +64 (0)28 2558 3782
>
> Best Regards,
> Strahil Nikolov
>
>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200108/76860e45/attachment.html>