[Gluster-users] GFS performance under heavy traffic
David Cunningham
dcunningham at voisonics.com
Tue Jan 7 23:03:47 UTC 2020
Hi Strahil,
Thanks for that. The queue/scheduler file for the relevant disk reports
"noop [deadline] cfq", so deadline is being used. It is using ext4, and
I've verified that the MTU is 1500.
We could change the filesystem from ext4 to xfs, but in this case we're not
looking to tinker around the edges and get a small performance improvement
- we need a very large improvement on the 114MBps of network traffic to
make it usable.
I think what we really need to do first is to reproduce the problem in
testing, and then come back to possible solutions.
On Tue, 7 Jan 2020 at 22:15, Strahil Nikolov <hunter86_bg at yahoo.com> wrote:
> To find the scheduler , find all pvs of the LV is providing your storage
>
> [root at ovirt1 ~]# df -Th /gluster_bricks/data_fast
> Filesystem Type Size Used Avail
> Use% Mounted on
> /dev/mapper/gluster_vg_nvme-gluster_lv_data_fast xfs 100G 39G 62G
> 39% /gluster_bricks/data_fast
>
>
> [root at ovirt1 ~]# pvs | grep gluster_vg_nvme
> /dev/mapper/vdo_nvme gluster_vg_nvme lvm2 a-- <1024.00g 0
>
> [root at ovirt1 ~]# cat /etc/vdoconf.yml
> ####################################################################
> # THIS FILE IS MACHINE GENERATED. DO NOT EDIT THIS FILE BY HAND.
> ####################################################################
> config: !Configuration
> vdos:
> vdo_nvme: !VDOService
> device: /dev/disk/by-id/nvme-ADATA_SX8200PNP_2J1120011596
>
>
> [root at ovirt1 ~]# ll /dev/disk/by-id/nvme-ADATA_SX8200PNP_2J1120011596
> lrwxrwxrwx. 1 root root 13 Dec 17 20:21
> /dev/disk/by-id/nvme-ADATA_SX8200PNP_2J1120011596 -> ../../nvme0n1
> [root at ovirt1 ~]# cat /sys/block/nvme0n1/queue/scheduler
> [none] mq-deadline kyber
>
> Note: If device is under multipath , you need to check all paths (you can
> get them from 'multipath -ll' command).
> The only scheduler you should avoid is "cfq" which was default for RHEL 6
> & SLES 11.
>
> XFS has better performance that ext-based systems.
>
> Another tuning is to use Red hat's tuned profiles for gluster. You can
> extract them from (or newer if you find)
> ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.4.2.0-1.el7rhgs.src.rpm
>
>
> About MTU - it's reducing the ammount of packages that the kernel has to
> process - but requires infrastructure to support that too. You can test by
> setting MTU on both sides to 9000 and then run 'tracepath remote-ip'. Also
> run a ping with large size without do not fragment flag -> 'ping -M do
> -s 8900 <destination-ip>' If ping comes back - you are good to go.
>
>
> Best Regards,
> Strahil Nikolov
>
> В вторник, 7 януари 2020 г., 3:00:23 ч. Гринуич-5, David Cunningham <
> dcunningham at voisonics.com> написа:
>
>
> Hi Strahil,
>
> I believe we are using the standard MTU of 1500 (would need to check with
> the network people to be sure). Does it make a difference?
>
> I'm afraid I don't know about the scheduler - where do I find that?
>
> Thank you for the suggestions about turning off performance.read-ahead and
> performance.readdir-ahead.
>
>
> On Tue, 7 Jan 2020 at 18:08, Strahil <hunter86_bg at yahoo.com> wrote:
>
> Hi David,
>
> It's difficult to find anything structured (but it's the same for Linux
> and other tech). I use Red Hat's doxumentation, guideds online (crosscheck
> the options with official documentation) and experience shared on the
> mailing list.
>
> I don't see anything (iin /var/lib/gluster/groups) that will match your
> profile, but I think that you should try with performance.read-ahead and
> performance.readdir-ahead 'off' . I have found out a bug (didn't read the
> whole stuff) , that might be interesting for you :
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1601166
>
> Also, Arbiter is very important in order to avoid split brain situations
> (but based on my experience , issues still can occur) and best the brick
> for the Arbiter to be an SSD as it needs to process the metadata as fast as
> possible. With v7, there is an option the client to have an Arbiter even
> in the cloud (remote arbiter) that is used only when 1 data brick is down.
>
> Please report the issue with the cache - that should not be like that.
>
> Are you using Jumbo frames (MTU 9000)?
> What is yoir brick's I/O scheduler ?
>
> Best Regards,
> Strahil Nikolov
> On Jan 7, 2020 01:34, David Cunningham <dcunningham at voisonics.com> wrote:
>
> Hi Strahil,
>
> We may have had a heal since the GFS arbiter node wasn't accessible from
> the GFS clients, only from the other GFS servers. Unfortunately we haven't
> been able to produce the problem seen in production while testing so are
> unsure whether making the GFS arbiter node directly available to clients
> has fixed the issue.
>
> The load on GFS is mainly:
> 1. There are a small number of files around 5MB in size which are read
> often and change infrequently.
> 2. There are a large number of directories which are opened for reading to
> read the list of contents frequently.
> 3. There are a large number of new files around 5MB in size written
> frequently and read infrequently.
>
> We haven't touched the tuning options as we don't really feel qualified to
> tell what needs changed from the default. Do you know of any suitable
> guides to get started?
>
> For some reason performance.cache-size is reported as both 32MB and 128MB.
> Is it worth reporting even for version 5.6?
>
> Here is the "gluster volume info" taken on the first node. Note that the
> third node (the arbiter) is currently taken out of the cluster:
> Volume Name: gvol0
> Type: Replicate
> Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: gfs1:/nodirectwritedata/gluster/gvol0
> Brick2: gfs2:/nodirectwritedata/gluster/gvol0
> Options Reconfigured:
> diagnostics.client-log-level: INFO
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
>
> Thanks for your help and advice.
>
>
> On Sat, 28 Dec 2019 at 17:46, Strahil <hunter86_bg at yahoo.com> wrote:
>
> Hi David,
>
> It seems that I have misread your quorum options, so just ignore that from
> my previous e-mail.
>
> Best Regards,
> Strahil Nikolov
> On Dec 27, 2019 15:38, Strahil <hunter86_bg at yahoo.com> wrote:
>
> Hi David,
>
> Gluster supports live rolling upgrade, so there is no need to redeploy at
> all - but the migration notes should be checked as some features must be
> disabled first.
> Also, the gluster client should remount in order to bump the gluster
> op-version.
>
> What kind of workload do you have ?
> I'm asking as there are predefined (and recommended) settings located at
> /var/lib/gluster/groups .
> You can check the options for each group and cross-check the options
> meaning in the docs before activating a setting.
>
> I still have a vague feeling that ,during that high-peak of network
> bandwidth, there was a heal going on. Have you checked that ?
>
> Also, sharding is very useful , when you work with large files and the
> heal is reduced to the size of the shard.
>
> N.B.: Once sharding is enabled, DO NOT DISABLE it - as you will loose
> your data.
>
> Using GLUSTER v7.1 (soon on CentOS & Debian) allows using latest
> features and optimizations while support from gluster Dev community is
> quite active.
>
> P.S: I'm wondering how 'performance.cache-size' can both be 32 MB and 128
> MB. Please double-check this (maybe I'm reading it wrong on my smartphone)
> and if needed raise a bug on bugzilla.redhat.com
>
> P.S2: Please provide 'gluster volume info' as 'cluster.quorum-type' ->
> 'none' is not normal for replicated volumes (arbiters are using in replica
> volumes)
>
> According to the dooutput (otps://
> docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/)
> :
>
> *Note:** Enabling the arbiter feature **automatically** configures* *client-quorum
> to 'auto'. This setting is **not** to be changed.*
>
> Here is my output (Hyperconverged Virtualization Cluster -> oVirt):
> # gluster volume info engine | grep quorum
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
>
> Changing quorum is more 'riskier' than other options, so you need to take
> necessary measures. I think , we all know what will happen , if the
> cluster is out of quorum and you change the quorum settings to more
> stringent ones :D
>
> P.S3: If you decide to reset your gluster volume to the defaults, you can
> create a new volume (same type as current one), the get the options for
> that volume and put them in a file and then bulk deploy via 'gluster volume
> set <Original Volume> group custom-group' , where the file is located
> on every gluster server in the '/var/lib/gluster/groups' directory.
> Last , get rid of the sample volume.
>
> Best Regards,
> Strahil Nikolov
> On Dec 27, 2019 03:22, David Cunningham <dcunningham at voisonics.com> wrote:
>
> Hi Strahil,
>
> Our volume options are as below. Thanks for the suggestion to upgrade to
> version 6 or 7. We could do that be simply removing the current
> installation and installing the new one (since it's not live right now). We
> might have to convince the customer that it's likely to succeed though, as
> at the moment I think they believe that GFS is not going to work for them.
>
> Option Value
>
> ------ -----
>
> cluster.lookup-unhashed on
>
> cluster.lookup-optimize on
>
> cluster.min-free-disk 10%
>
> cluster.min-free-inodes 5%
>
> cluster.rebalance-stats off
>
> cluster.subvols-per-directory (null)
>
> cluster.readdir-optimize off
>
> cluster.rsync-hash-regex (null)
>
> cluster.extra-hash-regex (null)
>
> cluster.dht-xattr-name trusted.glusterfs.dht
>
> cluster.randomize-hash-range-by-gfid off
>
> cluster.rebal-throttle normal
>
> cluster.lock-migration off
>
> cluster.force-migration off
>
> cluster.local-volume-name (null)
>
> cluster.weighted-rebalance on
>
> cluster.switch-pattern (null)
>
> cluster.entry-change-log on
>
> cluster.read-subvolume (null)
>
> cluster.read-subvolume-index -1
>
> cluster.read-hash-mode 1
>
> cluster.background-self-heal-count 8
>
> cluster.metadata-self-heal on
>
> cluster.data-self-heal on
>
> cluster.entry-self-heal on
>
> cluster.self-heal-daemon on
>
> cluster.heal-timeout 600
>
> cluster.self-heal-window-size 1
>
> cluster.data-change-log on
>
> cluster.metadata-change-log on
>
> cluster.data-self-heal-algorithm (null)
>
> cluster.eager-lock on
>
> disperse.eager-lock on
>
> disperse.other-eager-lock on
>
> disperse.eager-lock-timeout 1
>
> disperse.other-eager-lock-timeout 1
>
> cluster.quorum-type none
>
> cluster.quorum-count (null)
>
> cluster.choose-local true
>
> cluster.self-heal-readdir-size 1KB
>
> cluster.post-op-delay-secs 1
>
> cluster.ensure-durability on
>
> cluster.consistent-metadata no
>
> cluster.heal-wait-queue-length 128
>
> cluster.favorite-child-policy none
>
> cluster.full-lock yes
>
> cluster.stripe-block-size 128KB
>
> cluster.stripe-coalesce true
>
> diagnostics.latency-measurement off
>
> diagnostics.dump-fd-stats off
>
> diagnostics.count-fop-hits off
>
> diagnostics.brick-log-level INFO
>
> diagnostics.client-log-level INFO
>
> diagnostics.brick-sys-log-level CRITICAL
>
> diagnostics.client-sys-log-level CRITICAL
>
> diagnostics.brick-logger (null)
>
> diagnostics.client-logger (null)
>
> diagnostics.brick-log-format (null)
>
> diagnostics.client-log-format (null)
>
> diagnostics.brick-log-buf-size 5
>
> diagnostics.client-log-buf-size 5
>
> diagnostics.brick-log-flush-timeout 120
>
> diagnostics.client-log-flush-timeout 120
>
> diagnostics.stats-dump-interval 0
>
> diagnostics.fop-sample-interval 0
>
> diagnostics.stats-dump-format json
>
> diagnostics.fop-sample-buf-size 65535
>
> diagnostics.stats-dnscache-ttl-sec 86400
>
> performance.cache-max-file-size 0
>
> performance.cache-min-file-size 0
>
> performance.cache-refresh-timeout 1
>
> performance.cache-priority
>
> performance.cache-size 32MB
>
> performance.io-thread-count 16
>
> performance.high-prio-threads 16
>
> performance.normal-prio-threads 16
>
> performance.low-prio-threads 16
>
> performance.least-prio-threads 1
>
> performance.enable-least-priority on
>
> performance.iot-watchdog-secs (null)
>
> performance.iot-cleanup-disconnected-reqsoff
>
> performance.iot-pass-through false
>
> performance.io-cache-pass-through false
>
> performance.cache-size 128MB
>
> performance.qr-cache-timeout 1
>
> performance.cache-invalidation false
>
> performance.ctime-invalidation false
>
> performance.flush-behind on
>
> performance.nfs.flush-behind on
>
> performance.write-behind-window-size 1MB
>
> performance.resync-failed-syncs-after-fsyncoff
>
> performance.nfs.write-behind-window-size1MB
>
> performance.strict-o-direct off
>
> performance.nfs.strict-o-direct off
>
> performance.strict-write-ordering off
>
> performance.nfs.strict-write-ordering off
>
> performance.write-behind-trickling-writeson
>
> performance.aggregate-size 128KB
>
> performance.nfs.write-behind-trickling-writeson
>
> performance.lazy-open yes
>
> performance.read-after-open yes
>
> performance.open-behind-pass-through false
>
> performance.read-ahead-page-count 4
>
> performance.read-ahead-pass-through false
>
> performance.readdir-ahead-pass-through false
>
> performance.md-cache-pass-through false
>
> performance.md-cache-timeout 1
>
> performance.cache-swift-metadata true
>
> performance.cache-samba-metadata false
>
> performance.cache-capability-xattrs true
>
> performance.cache-ima-xattrs true
>
> performance.md-cache-statfs off
>
> performance.xattr-cache-list
>
> performance.nl-cache-pass-through false
>
> features.encryption off
>
> encryption.master-key (null)
>
> encryption.data-key-size 256
>
> encryption.block-size 4096
>
> network.frame-timeout 1800
>
> network.ping-timeout 42
>
> network.tcp-window-size (null)
>
> network.remote-dio disable
>
> client.event-threads 2
>
> client.tcp-user-timeout 0
>
> client.keepalive-time 20
>
> client.keepalive-interval 2
>
> client.keepalive-count 9
>
> network.tcp-window-size (null)
>
> network.inode-lru-limit 16384
>
> auth.allow *
>
> auth.reject (null)
>
> transport.keepalive 1
>
> server.allow-insecure on
>
> server.root-squash off
>
> server.anonuid 65534
>
> server.anongid 65534
>
> server.statedump-path /var/run/gluster
>
> server.outstanding-rpc-limit 64
>
> server.ssl (null)
>
> auth.ssl-allow *
>
> server.manage-gids off
>
> server.dynamic-auth on
>
> client.send-gids on
>
> server.gid-timeout 300
>
> server.own-thread (null)
>
> server.event-threads 1
>
> server.tcp-user-timeout 0
>
> server.keepalive-time 20
>
> server.keepalive-interval 2
>
> server.keepalive-count 9
>
> transport.listen-backlog 1024
>
> ssl.own-cert (null)
>
> ssl.private-key (null)
>
> ssl.ca-list (null)
>
> ssl.crl-path (null)
>
> ssl.certificate-depth (null)
>
> ssl.cipher-list (null)
>
> ssl.dh-param (null)
>
> ssl.ec-curve (null)
>
> transport.address-family inet
>
> performance.write-behind on
>
> performance.read-ahead on
>
> performance.readdir-ahead on
>
> performance.io-cache on
>
> performance.quick-read on
>
> performance.open-behind on
>
> performance.nl-cache off
>
> performance.stat-prefetch on
>
> performance.client-io-threads off
>
> performance.nfs.write-behind on
>
> performance.nfs.read-ahead off
>
> performance.nfs.io-cache off
>
> performance.nfs.quick-read off
>
> performance.nfs.stat-prefetch off
>
> performance.nfs.io-threads off
>
> performance.force-readdirp true
>
> performance.cache-invalidation false
>
> features.uss off
>
> features.snapshot-directory .snaps
>
> features.show-snapshot-directory off
>
> features.tag-namespaces off
>
> network.compression off
>
> network.compression.window-size -15
>
> network.compression.mem-level 8
>
> network.compression.min-size 0
>
> network.compression.compression-level -1
>
> network.compression.debug false
>
> features.default-soft-limit 80%
>
> features.soft-timeout 60
>
> features.hard-timeout 5
>
> features.alert-time 86400
>
> features.quota-deem-statfs off
>
> geo-replication.indexing off
>
> geo-replication.indexing off
>
> geo-replication.ignore-pid-check off
>
> geo-replication.ignore-pid-check off
>
> features.quota off
>
> features.inode-quota off
>
> features.bitrot disable
>
> debug.trace off
>
> debug.log-history no
>
> debug.log-file no
>
> debug.exclude-ops (null)
>
> debug.include-ops (null)
>
> debug.error-gen off
>
> debug.error-failure (null)
>
> debug.error-number (null)
>
> debug.random-failure off
>
> debug.error-fops (null)
>
> nfs.disable on
>
> features.read-only off
>
> features.worm off
>
> features.worm-file-level off
>
> features.worm-files-deletable on
>
> features.default-retention-period 120
>
> features.retention-mode relax
>
> features.auto-commit-period 180
>
> storage.linux-aio off
>
> storage.batch-fsync-mode reverse-fsync
>
> storage.batch-fsync-delay-usec 0
>
> storage.owner-uid -1
>
> storage.owner-gid -1
>
> storage.node-uuid-pathinfo off
>
> storage.health-check-interval 30
>
> storage.build-pgfid off
>
> storage.gfid2path on
>
> storage.gfid2path-separator :
>
> storage.reserve 1
>
> storage.health-check-timeout 10
>
> storage.fips-mode-rchecksum off
>
> storage.force-create-mode 0000
>
> storage.force-directory-mode 0000
>
> storage.create-mask 0777
>
> storage.create-directory-mask 0777
>
> storage.max-hardlinks 100
>
> storage.ctime off
>
> storage.bd-aio off
>
> config.gfproxyd off
>
> cluster.server-quorum-type off
>
> cluster.server-quorum-ratio 0
>
> changelog.changelog off
>
> changelog.changelog-dir {{ brick.path
> }}/.glusterfs/changelogs
> changelog.encoding ascii
>
> changelog.rollover-time 15
>
> changelog.fsync-interval 5
>
> changelog.changelog-barrier-timeout 120
>
> changelog.capture-del-path off
>
> features.barrier disable
>
> features.barrier-timeout 120
>
> features.trash off
>
> features.trash-dir .trashcan
>
> features.trash-eliminate-path (null)
>
> features.trash-max-filesize 5MB
>
> features.trash-internal-op off
>
> cluster.enable-shared-storage disable
>
> cluster.write-freq-threshold 0
>
> cluster.read-freq-threshold 0
>
> cluster.tier-pause off
>
> cluster.tier-promote-frequency 120
>
> cluster.tier-demote-frequency 3600
>
> cluster.watermark-hi 90
>
> cluster.watermark-low 75
>
> cluster.tier-mode cache
>
> cluster.tier-max-promote-file-size 0
>
> cluster.tier-max-mb 4000
>
> cluster.tier-max-files 10000
>
> cluster.tier-query-limit 100
>
> cluster.tier-compact on
>
> cluster.tier-hot-compact-frequency 604800
>
> cluster.tier-cold-compact-frequency 604800
>
> features.ctr-enabled off
>
> features.record-counters off
>
> features.ctr-record-metadata-heat off
>
> features.ctr_link_consistency off
>
> features.ctr_lookupheal_link_timeout 300
>
> features.ctr_lookupheal_inode_timeout 300
>
> features.ctr-sql-db-cachesize 12500
>
> features.ctr-sql-db-wal-autocheckpoint 25000
>
> features.selinux on
>
> locks.trace off
>
> locks.mandatory-locking off
>
> cluster.disperse-self-heal-daemon enable
>
> cluster.quorum-reads no
>
> client.bind-insecure (null)
>
> features.shard off
>
> features.shard-block-size 64MB
>
> features.shard-lru-limit 16384
>
> features.shard-deletion-rate 100
>
> features.scrub-throttle lazy
>
> features.scrub-freq biweekly
>
> features.scrub false
>
> features.expiry-time 120
>
> features.cache-invalidation off
>
> features.cache-invalidation-timeout 60
>
> features.leases off
>
> features.lease-lock-recall-timeout 60
>
> disperse.background-heals 8
>
> disperse.heal-wait-qlength 128
>
> cluster.heal-timeout 600
>
> dht.force-readdirp on
>
> disperse.read-policy gfid-hash
>
> cluster.shd-max-threads 1
>
> cluster.shd-wait-qlength 1024
>
> cluster.locking-scheme full
>
> cluster.granular-entry-heal no
>
> features.locks-revocation-secs 0
>
> features.locks-revocation-clear-all false
>
> features.locks-revocation-max-blocked 0
>
> features.locks-monkey-unlocking false
>
> features.locks-notify-contention no
>
> features.locks-notify-contention-delay 5
>
> disperse.shd-max-threads 1
>
> disperse.shd-wait-qlength 1024
>
> disperse.cpu-extensions auto
>
> disperse.self-heal-window-size 1
>
> cluster.use-compound-fops off
>
> performance.parallel-readdir off
>
> performance.rda-request-size 131072
>
> performance.rda-low-wmark 4096
>
> performance.rda-high-wmark 128KB
>
> performance.rda-cache-limit 10MB
>
> performance.nl-cache-positive-entry false
>
> performance.nl-cache-limit 10MB
>
> performance.nl-cache-timeout 60
>
> cluster.brick-multiplex off
>
> cluster.max-bricks-per-process 0
>
> disperse.optimistic-change-log on
>
> disperse.stripe-cache 4
>
> cluster.halo-enabled False
>
> cluster.halo-shd-max-latency 99999
>
> cluster.halo-nfsd-max-latency 5
>
> cluster.halo-max-latency 5
>
> cluster.halo-max-replicas 99999
>
> cluster.halo-min-replicas 2
>
> cluster.daemon-log-level INFO
>
> debug.delay-gen off
>
> delay-gen.delay-percentage 10%
>
> delay-gen.delay-duration 100000
>
> delay-gen.enable
>
> disperse.parallel-writes on
>
> features.sdfs on
>
> features.cloudsync off
>
> features.utime off
>
> ctime.noatime on
>
> feature.cloudsync-storetype (null)
>
>
> Thanks again.
>
>
> On Wed, 25 Dec 2019 at 05:51, Strahil <hunter86_bg at yahoo.com> wrote:
>
> Hi David,
>
> On Dec 24, 2019 02:47, David Cunningham <dcunningham at voisonics.com> wrote:
> >
> > Hello,
> >
> > In testing we found that actually the GFS client having access to all 3
> nodes made no difference to performance. Perhaps that's because the 3rd
> node that wasn't accessible from the client before was the arbiter node?
> It makes sense, as no data is being generated towards the arbiter.
> > Presumably we shouldn't have an arbiter node listed under
> backupvolfile-server when mounting the filesystem? Since it doesn't store
> all the data surely it can't be used to serve the data.
>
> I have my arbiter defined as last backup and no issues so far. At least
> the admin can easily identify the bricks from the mount options.
>
> > We did have direct-io-mode=disable already as well, so that wasn't a
> factor in the performance problems.
>
> Have you checked if the client vedsion ia not too old.
> Also you can check the cluster's operation cersion:
> # gluster volume get all cluster.max-op-version
> # gluster volume get all cluster.op-version
>
> Cluster's op version should be at max-op-version.
>
> In my mind come 2 options:
> A) Upgrade to latest GLUSTER v6 or even v7 ( I know it won't be easy) and
> then set the op version to highest possible.
> # gluster volume get all cluster.max-op-version
> # gluster volume get all cluster.op-version
>
> B) Deploy a NFS Ganesha server and connect the client over NFS v4.2 (and
> control the parallel connections from Ganesha).
>
> Can you provide your Gluster volume's options?
> 'gluster volume get <VOLNAME> all'
>
> > Thanks again for any advice.
> >
> >
> >
> > On Mon, 23 Dec 2019 at 13:09, David Cunningham <
> dcunningham at voisonics.com> wrote:
> >>
> >> Hi Strahil,
> >>
> >> Thanks for that. We do have one backup server specified, but will add
> the second backup as well.
> >>
> >>
> >> On Sat, 21 Dec 2019 at 11:26, Strahil <hunter86_bg at yahoo.com> wrote:
> >>>
> >>> Hi David,
> >>>
> >>> Also consider using the mount option to specify backup server via
> 'backupvolfile-server=server2:server3' (you can define more but I don't
> thing replica volumes greater that 3 are usefull (maybe in some special
> cases).
> >>>
> >>> In such way, when the primary is lost, your client can reach a backup
> one without disruption.
> >>>
> >>> P.S.: Client may 'hang' - if the primary server got rebooted
> ungracefully - as the communication must timeout before FUSE addresses the
> next server. There is a special script for killing gluster processes in
> '/usr/share/gluster/scripts' which can be used for setting up a systemd
> service to do that for you on shutdown.
> >>>
> >>> Best Regards,
> >>> Strahil Nikolov
> >>>
> >>> On Dec 20, 2019 23:49, David Cunningham <dcunningham at voisonics.com>
> wrote:
> >>>>
> >>>> Hi Stahil,
> >>>>
> >>>> Ah, that is an important point. One of the nodes is not accessible
> from the client, and we assumed that it only needed to reach the GFS node
> that was mounted so didn't think anything of it.
> >>>>
> >>>> We will try making all nodes accessible, as well as
> "direct-io-mode=disable".
> >>>>
> >>>> Thank you.
> >>>>
> >>>>
> >>>> On Sat, 21 Dec 2019 at 10:29, Strahil Nikolov <hunter86_bg at yahoo.com>
> wrote:
> >>>>>
> >>>>> Actually I haven't clarified myself.
> >>>>> FUSE mounts on the client side is connecting directly to all bricks
> consisted of the volume.
> >>>>> If for some reason (bad routing, firewall blocked) there could be
> cases where the client can reach 2 out of 3 bricks and this can constantly
> cause healing to happen (as one of the bricks is never updated) which will
> degrade the performance and cause excessive network usage.
> >>>>> As your attachment is from one of the gluster nodes, this could be
> the case.
> >>>>>
> >>>>> Best Regards,
> >>>>> Strahil Nikolov
> >>>>>
> >>>>> В петък, 20 декември 2019 г., 01:49:56 ч. Гринуич+2, David
> Cunningham <dcunningham at voisonics.com> написа:
> >>>>>
> >>>>>
> >>>>> Hi Strahil,
> >>>>>
> >>>>> The chart attached to my original email is taken from the GFS server.
> >>>>>
> >>>>> I'm not sure what you mean by accessing all bricks simultaneously.
> We've mounted it from the client like this:
> >>>>> gfs1:/gvol0 /mnt/glusterfs/ glusterfs
> defaults,direct-io-mode=disable,_netdev,backupvolfile-server=gfs2,fetch-attempts=10
> 0 0
> >>>>>
> >>>>> Should we do something different to access all bricks simultaneously?
> >>>>>
> >>>>> Thanks for your help!
> >>>>>
> >>>>>
> >>>>> On Fri, 20 Dec 2019 at 11:47, Strahil Nikolov <hunter86_bg at yahoo.com>
> wrote:
> >>>>>>
> >>>>>> I'm not sure if you did measure the traffic from client side
> (tcpdump on a client machine) or from Server side.
> >>>>>>
> >>>>>> In both cases , please verify that the client accesses all bricks
> simultaneously, as this can cause unnecessary heals.
> >>>>>>
> >>>>>> Have you thought about upgrading to v6? There are some enhancements
> in v6 which could be beneficial.
> >>>>>>
> >>>>>> Yet, it is indeed strange that so much traffic is generated with
> FUSE.
> >>>>>>
> >>>>>> Another aproach is to test with NFSGanesha which suports pNFS and
> can natively speak with Gluster, which cant bring you closer to the
> previous setup and also provide some extra performance.
> >>>>>>
> >>>>>>
> >>>>>> Best Regards,
> >>>>>> Strahil Nikolov
> >>>>>>
> >>>>>>
> >>>>>>
> >>
> >>
> >> --
> >> David Cunningham, Voisonics Limited
> >> http://voisonics.com/
> >> USA: +1 213 221 1092
> >> New Zealand: +64 (0)28 2558 3782
> >
> >
> >
> > --
> > David Cunningham, Voisonics Limited
> > http://voisonics.com/
> > USA: +1 213 221 1092
> > New Zealand: +64 (0)28 2558 3782
>
> Best Regards,
> Strahil Nikolov
>
>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
--
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200108/76860e45/attachment.html>
More information about the Gluster-users
mailing list