[Gluster-users] GFS performance under heavy traffic

Fri Dec 27 01:22:00 UTC 2019

Hi Strahil,

Our volume options are as below. Thanks for the suggestion to upgrade to
version 6 or 7. We could do that be simply removing the current
installation and installing the new one (since it's not live right now). We
might have to convince the customer that it's likely to succeed though, as
at the moment I think they believe that GFS is not going to work for them.

Option                                  Value

------                                  -----

cluster.lookup-unhashed                 on

cluster.lookup-optimize                 on

cluster.min-free-disk                   10%

cluster.min-free-inodes                 5%

cluster.rebalance-stats                 off

cluster.subvols-per-directory           (null)

cluster.readdir-optimize                off

cluster.rsync-hash-regex                (null)

cluster.extra-hash-regex                (null)

cluster.dht-xattr-name                  trusted.glusterfs.dht

cluster.randomize-hash-range-by-gfid    off

cluster.rebal-throttle                  normal

cluster.lock-migration                  off

cluster.force-migration                 off

cluster.local-volume-name               (null)

cluster.weighted-rebalance              on

cluster.switch-pattern                  (null)

cluster.entry-change-log                on

cluster.read-subvolume                  (null)

cluster.read-subvolume-index            -1

cluster.read-hash-mode                  1

cluster.background-self-heal-count      8

cluster.metadata-self-heal              on

cluster.data-self-heal                  on

cluster.entry-self-heal                 on

cluster.self-heal-daemon                on

cluster.heal-timeout                    600

cluster.self-heal-window-size           1

cluster.data-change-log                 on

cluster.metadata-change-log             on

cluster.data-self-heal-algorithm        (null)

cluster.eager-lock                      on

disperse.eager-lock                     on

disperse.other-eager-lock               on

disperse.eager-lock-timeout             1

disperse.other-eager-lock-timeout       1

cluster.quorum-type                     none

cluster.quorum-count                    (null)

cluster.choose-local                    true

cluster.self-heal-readdir-size          1KB

cluster.post-op-delay-secs              1

cluster.ensure-durability               on

cluster.consistent-metadata             no

cluster.heal-wait-queue-length          128

cluster.favorite-child-policy           none

cluster.full-lock                       yes

cluster.stripe-block-size               128KB

cluster.stripe-coalesce                 true

diagnostics.latency-measurement         off

diagnostics.dump-fd-stats               off

diagnostics.count-fop-hits              off

diagnostics.brick-log-level             INFO

diagnostics.client-log-level            INFO

diagnostics.brick-sys-log-level         CRITICAL

diagnostics.client-sys-log-level        CRITICAL

diagnostics.brick-logger                (null)

diagnostics.client-logger               (null)

diagnostics.brick-log-format            (null)

diagnostics.client-log-format           (null)

diagnostics.brick-log-buf-size          5

diagnostics.client-log-buf-size         5

diagnostics.brick-log-flush-timeout     120

diagnostics.client-log-flush-timeout    120

diagnostics.stats-dump-interval         0

diagnostics.fop-sample-interval         0

diagnostics.stats-dump-format           json

diagnostics.fop-sample-buf-size         65535

diagnostics.stats-dnscache-ttl-sec      86400

performance.cache-max-file-size         0

performance.cache-min-file-size         0

performance.cache-refresh-timeout       1

performance.cache-priority

performance.cache-size                  32MB

performance.io-thread-count             16

performance.high-prio-threads           16

performance.normal-prio-threads         16

performance.low-prio-threads            16

performance.least-prio-threads          1

performance.enable-least-priority       on

performance.iot-watchdog-secs           (null)

performance.iot-cleanup-disconnected-reqsoff

performance.iot-pass-through            false

performance.io-cache-pass-through       false

performance.cache-size                  128MB

performance.qr-cache-timeout            1

performance.cache-invalidation          false

performance.ctime-invalidation          false

performance.flush-behind                on

performance.nfs.flush-behind            on

performance.write-behind-window-size    1MB

performance.resync-failed-syncs-after-fsyncoff

performance.nfs.write-behind-window-size1MB

performance.strict-o-direct             off

performance.nfs.strict-o-direct         off

performance.strict-write-ordering       off

performance.nfs.strict-write-ordering   off

performance.write-behind-trickling-writeson

performance.aggregate-size              128KB

performance.nfs.write-behind-trickling-writeson

performance.lazy-open                   yes

performance.read-after-open             yes

performance.open-behind-pass-through    false

performance.read-ahead-page-count       4

performance.read-ahead-pass-through     false

performance.readdir-ahead-pass-through  false

performance.md-cache-pass-through       false

performance.md-cache-timeout            1

performance.cache-swift-metadata        true

performance.cache-samba-metadata        false

performance.cache-capability-xattrs     true

performance.cache-ima-xattrs            true

performance.md-cache-statfs             off

performance.xattr-cache-list

performance.nl-cache-pass-through       false

features.encryption                     off

encryption.master-key                   (null)

encryption.data-key-size                256

encryption.block-size                   4096

network.frame-timeout                   1800

network.ping-timeout                    42

network.tcp-window-size                 (null)

network.remote-dio                      disable

client.event-threads                    2

client.tcp-user-timeout                 0

client.keepalive-time                   20

client.keepalive-interval               2

client.keepalive-count                  9

network.tcp-window-size                 (null)

network.inode-lru-limit                 16384

auth.allow                              *

auth.reject                             (null)

transport.keepalive                     1

server.allow-insecure                   on

server.root-squash                      off

server.anonuid                          65534

server.anongid                          65534

server.statedump-path                   /var/run/gluster

server.outstanding-rpc-limit            64

server.ssl                              (null)

auth.ssl-allow                          *

server.manage-gids                      off

server.dynamic-auth                     on

client.send-gids                        on

server.gid-timeout                      300

server.own-thread                       (null)

server.event-threads                    1

server.tcp-user-timeout                 0

server.keepalive-time                   20

server.keepalive-interval               2

server.keepalive-count                  9

transport.listen-backlog                1024

ssl.own-cert                            (null)

ssl.private-key                         (null)

ssl.ca-list                             (null)

ssl.crl-path                            (null)

ssl.certificate-depth                   (null)

ssl.cipher-list                         (null)

ssl.dh-param                            (null)

ssl.ec-curve                            (null)

transport.address-family                inet

performance.write-behind                on

performance.read-ahead                  on

performance.readdir-ahead               on

performance.io-cache                    on

performance.quick-read                  on

performance.open-behind                 on

performance.nl-cache                    off

performance.stat-prefetch               on

performance.client-io-threads           off

performance.nfs.write-behind            on

performance.nfs.read-ahead              off

performance.nfs.io-cache                off

performance.nfs.quick-read              off

performance.nfs.stat-prefetch           off

performance.nfs.io-threads              off

performance.force-readdirp              true

performance.cache-invalidation          false

features.uss                            off

features.snapshot-directory             .snaps

features.show-snapshot-directory        off

features.tag-namespaces                 off

network.compression                     off

network.compression.window-size         -15

network.compression.mem-level           8

network.compression.min-size            0

network.compression.compression-level   -1

network.compression.debug               false

features.default-soft-limit             80%

features.soft-timeout                   60

features.hard-timeout                   5

features.alert-time                     86400

features.quota-deem-statfs              off

geo-replication.indexing                off

geo-replication.indexing                off

geo-replication.ignore-pid-check        off

geo-replication.ignore-pid-check        off

features.quota                          off

features.inode-quota                    off

features.bitrot                         disable

debug.trace                             off

debug.log-history                       no

debug.log-file                          no

debug.exclude-ops                       (null)

debug.include-ops                       (null)

debug.error-gen                         off

debug.error-failure                     (null)

debug.error-number                      (null)

debug.random-failure                    off

debug.error-fops                        (null)

nfs.disable                             on

features.read-only                      off

features.worm                           off

features.worm-file-level                off

features.worm-files-deletable           on

features.default-retention-period       120

features.retention-mode                 relax

features.auto-commit-period             180

storage.linux-aio                       off

storage.batch-fsync-mode                reverse-fsync

storage.batch-fsync-delay-usec          0

storage.owner-uid                       -1

storage.owner-gid                       -1

storage.node-uuid-pathinfo              off

storage.health-check-interval           30

storage.build-pgfid                     off

storage.gfid2path                       on

storage.gfid2path-separator             :

storage.reserve                         1

storage.health-check-timeout            10

storage.fips-mode-rchecksum             off

storage.force-create-mode               0000

storage.force-directory-mode            0000

storage.create-mask                     0777

storage.create-directory-mask           0777

storage.max-hardlinks                   100

storage.ctime                           off

storage.bd-aio                          off

config.gfproxyd                         off

cluster.server-quorum-type              off

cluster.server-quorum-ratio             0

changelog.changelog                     off

changelog.changelog-dir                 {{ brick.path
}}/.glusterfs/changelogs
changelog.encoding                      ascii

changelog.rollover-time                 15

changelog.fsync-interval                5

changelog.changelog-barrier-timeout     120

changelog.capture-del-path              off

features.barrier                        disable

features.barrier-timeout                120

features.trash                          off

features.trash-dir                      .trashcan

features.trash-eliminate-path           (null)

features.trash-max-filesize             5MB

features.trash-internal-op              off

cluster.enable-shared-storage           disable

cluster.write-freq-threshold            0

cluster.read-freq-threshold             0

cluster.tier-pause                      off

cluster.tier-promote-frequency          120

cluster.tier-demote-frequency           3600

cluster.watermark-hi                    90

cluster.watermark-low                   75

cluster.tier-mode                       cache

cluster.tier-max-promote-file-size      0

cluster.tier-max-mb                     4000

cluster.tier-max-files                  10000

cluster.tier-query-limit                100

cluster.tier-compact                    on

cluster.tier-hot-compact-frequency      604800

cluster.tier-cold-compact-frequency     604800

features.ctr-enabled                    off

features.record-counters                off

features.ctr-record-metadata-heat       off

features.ctr_link_consistency           off

features.ctr_lookupheal_link_timeout    300

features.ctr_lookupheal_inode_timeout   300

features.ctr-sql-db-cachesize           12500

features.ctr-sql-db-wal-autocheckpoint  25000

features.selinux                        on

locks.trace                             off

locks.mandatory-locking                 off

cluster.disperse-self-heal-daemon       enable

cluster.quorum-reads                    no

client.bind-insecure                    (null)

features.shard                          off

features.shard-block-size               64MB

features.shard-lru-limit                16384

features.shard-deletion-rate            100

features.scrub-throttle                 lazy

features.scrub-freq                     biweekly

features.scrub                          false

features.expiry-time                    120

features.cache-invalidation             off

features.cache-invalidation-timeout     60

features.leases                         off

features.lease-lock-recall-timeout      60

disperse.background-heals               8

disperse.heal-wait-qlength              128

cluster.heal-timeout                    600

dht.force-readdirp                      on

disperse.read-policy                    gfid-hash

cluster.shd-max-threads                 1

cluster.shd-wait-qlength                1024

cluster.locking-scheme                  full

cluster.granular-entry-heal             no

features.locks-revocation-secs          0

features.locks-revocation-clear-all     false

features.locks-revocation-max-blocked   0

features.locks-monkey-unlocking         false

features.locks-notify-contention        no

features.locks-notify-contention-delay  5

disperse.shd-max-threads                1

disperse.shd-wait-qlength               1024

disperse.cpu-extensions                 auto

disperse.self-heal-window-size          1

cluster.use-compound-fops               off

performance.parallel-readdir            off

performance.rda-request-size            131072

performance.rda-low-wmark               4096

performance.rda-high-wmark              128KB

performance.rda-cache-limit             10MB

performance.nl-cache-positive-entry     false

performance.nl-cache-limit              10MB

performance.nl-cache-timeout            60

cluster.brick-multiplex                 off

cluster.max-bricks-per-process          0

disperse.optimistic-change-log          on

disperse.stripe-cache                   4

cluster.halo-enabled                    False

cluster.halo-shd-max-latency            99999

cluster.halo-nfsd-max-latency           5

cluster.halo-max-latency                5

cluster.halo-max-replicas               99999

cluster.halo-min-replicas               2

cluster.daemon-log-level                INFO

debug.delay-gen                         off

delay-gen.delay-percentage              10%

delay-gen.delay-duration                100000

delay-gen.enable

disperse.parallel-writes                on

features.sdfs                           on

features.cloudsync                      off

features.utime                          off

ctime.noatime                           on

feature.cloudsync-storetype             (null)

Thanks again.

On Wed, 25 Dec 2019 at 05:51, Strahil <hunter86_bg at yahoo.com> wrote:

> Hi David,
>
> On Dec 24, 2019 02:47, David Cunningham <dcunningham at voisonics.com> wrote:
> >
> > Hello,
> >
> > In testing we found that actually the GFS client having access to all 3
> nodes made no difference to performance. Perhaps that's because the 3rd
> node that wasn't accessible from the client before was the arbiter node?
> It makes sense, as no data is being generated towards the arbiter.
> > Presumably we shouldn't have an arbiter node listed under
> backupvolfile-server when mounting the filesystem? Since it doesn't store
> all the data surely it can't be used to serve the data.
>
> I have my arbiter defined as last backup and no issues so far. At least
> the admin can easily identify the bricks from the mount options.
>
> > We did have direct-io-mode=disable already as well, so that wasn't a
> factor in the performance problems.
>
> Have you checked if the client vedsion ia not too old.
> Also you can check the cluster's  operation cersion:
> # gluster volume get all cluster.max-op-version
> # gluster volume get all cluster.op-version
>
> Cluster's op version should be at max-op-version.
>
> In my mind come 2  options:
> A) Upgrade to latest GLUSTER v6 or even v7 ( I know it won't be easy) and
> then set the op version to highest possible.
> # gluster volume get all cluster.max-op-version
> # gluster volume get all cluster.op-version
>
> B)  Deploy a NFS Ganesha server and connect the client over NFS v4.2 (and
> control the parallel connections from Ganesha).
>
> Can you provide your  Gluster volume's  options?
> 'gluster volume get <VOLNAME>  all'
>
> > Thanks again for any advice.
> >
> >
> >
> > On Mon, 23 Dec 2019 at 13:09, David Cunningham <
> dcunningham at voisonics.com> wrote:
> >>
> >> Hi Strahil,
> >>
> >> Thanks for that. We do have one backup server specified, but will add
> the second backup as well.
> >>
> >>
> >> On Sat, 21 Dec 2019 at 11:26, Strahil <hunter86_bg at yahoo.com> wrote:
> >>>
> >>> Hi David,
> >>>
> >>> Also consider using the  mount option to specify backup server via
> 'backupvolfile-server=server2:server3' (you can define more but I don't
> thing replica volumes  greater that 3 are usefull (maybe  in some special
> cases).
> >>>
> >>> In such way, when the primary is lost, your client can reach a backup
> one without disruption.
> >>>
> >>> P.S.: Client may 'hang' - if the primary server got rebooted
> ungracefully - as the communication must timeout before FUSE addresses the
> next server. There is a special script for  killing gluster processes in
> '/usr/share/gluster/scripts' which can be used  for  setting up a systemd
> service to do that for you on shutdown.
> >>>
> >>> Best Regards,
> >>> Strahil Nikolov
> >>>
> >>> On Dec 20, 2019 23:49, David Cunningham <dcunningham at voisonics.com>
> wrote:
> >>>>
> >>>> Hi Stahil,
> >>>>
> >>>> Ah, that is an important point. One of the nodes is not accessible
> from the client, and we assumed that it only needed to reach the GFS node
> that was mounted so didn't think anything of it.
> >>>>
> >>>> We will try making all nodes accessible, as well as
> "direct-io-mode=disable".
> >>>>
> >>>> Thank you.
> >>>>
> >>>>
> >>>> On Sat, 21 Dec 2019 at 10:29, Strahil Nikolov <hunter86_bg at yahoo.com>
> wrote:
> >>>>>
> >>>>> Actually I haven't clarified myself.
> >>>>> FUSE mounts on the client side is connecting directly to all bricks
> consisted of the volume.
> >>>>> If for some reason (bad routing, firewall blocked) there could be
> cases where the client can reach 2 out of 3 bricks and this can constantly
> cause healing to happen (as one of the bricks is never updated) which will
> degrade the performance and cause excessive network usage.
> >>>>> As your attachment is from one of the gluster nodes, this could be
> the case.
> >>>>>
> >>>>> Best Regards,
> >>>>> Strahil Nikolov
> >>>>>
> >>>>> В петък, 20 декември 2019 г., 01:49:56 ч. Гринуич+2, David
> Cunningham <dcunningham at voisonics.com> написа:
> >>>>>
> >>>>>
> >>>>> Hi Strahil,
> >>>>>
> >>>>> The chart attached to my original email is taken from the GFS server.
> >>>>>
> >>>>> I'm not sure what you mean by accessing all bricks simultaneously.
> We've mounted it from the client like this:
> >>>>> gfs1:/gvol0 /mnt/glusterfs/ glusterfs
> defaults,direct-io-mode=disable,_netdev,backupvolfile-server=gfs2,fetch-attempts=10
> 0 0
> >>>>>
> >>>>> Should we do something different to access all bricks simultaneously?
> >>>>>
> >>>>> Thanks for your help!
> >>>>>
> >>>>>
> >>>>> On Fri, 20 Dec 2019 at 11:47, Strahil Nikolov <hunter86_bg at yahoo.com>
> wrote:
> >>>>>>
> >>>>>> I'm not sure if you did measure the traffic from client side
> (tcpdump on a client machine) or from Server side.
> >>>>>>
> >>>>>> In both cases , please verify that the client accesses all bricks
> simultaneously, as this can cause unnecessary heals.
> >>>>>>
> >>>>>> Have you thought about upgrading to v6? There are some enhancements
> in v6 which could be beneficial.
> >>>>>>
> >>>>>> Yet, it is indeed strange that so much traffic is generated with
> FUSE.
> >>>>>>
> >>>>>> Another aproach is to test with NFSGanesha which suports pNFS and
> can natively speak with Gluster, which cant bring you closer to the
> previous setup and also provide some extra performance.
> >>>>>>
> >>>>>>
> >>>>>> Best Regards,
> >>>>>> Strahil Nikolov
> >>>>>>
> >>>>>>
> >>>>>>
> >>
> >>
> >> --
> >> David Cunningham, Voisonics Limited
> >> http://voisonics.com/
> >> USA: +1 213 221 1092
> >> New Zealand: +64 (0)28 2558 3782
> >
> >
> >
> > --
> > David Cunningham, Voisonics Limited
> > http://voisonics.com/
> > USA: +1 213 221 1092
> > New Zealand: +64 (0)28 2558 3782
>
> Best Regards,
> Strahil Nikolov
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20191227/e883e643/attachment.html>