<div dir="ltr"><div>Hi Strahil,</div><div><br></div><div>Our volume options are as below. Thanks for the suggestion to upgrade to version 6 or 7. We could do that be simply removing the current installation and installing the new one (since it's not live right now). We might have to convince the customer that it's likely to succeed though, as at the moment I think they believe that GFS is not going to work for them.</div><div><br></div><div>Option                  Value                  <br>------                  -----                  <br>cluster.lookup-unhashed         on                    <br>cluster.lookup-optimize         on                    <br>cluster.min-free-disk          10%                   <br>cluster.min-free-inodes         5%                    <br>cluster.rebalance-stats         off                   <br>cluster.subvols-per-directory      (null)                  <br>cluster.readdir-optimize         off                   <br>cluster.rsync-hash-regex         (null)                  <br>cluster.extra-hash-regex         (null)                  <br>cluster.dht-xattr-name          trusted.glusterfs.dht          <br>cluster.randomize-hash-range-by-gfid   off                   <br>cluster.rebal-throttle          normal                  <br>cluster.lock-migration          off                   <br>cluster.force-migration         off                   <br>cluster.local-volume-name        (null)                  <br>cluster.weighted-rebalance        on                    <br>cluster.switch-pattern          (null)                  <br>cluster.entry-change-log         on                    <br>cluster.read-subvolume          (null)                  <br>cluster.read-subvolume-index       -1                    <br>cluster.read-hash-mode          1                    <br>cluster.background-self-heal-count    8                    <br>cluster.metadata-self-heal        on                    <br>cluster.data-self-heal          on                    <br>cluster.entry-self-heal         on                    <br>cluster.self-heal-daemon         on                    <br>cluster.heal-timeout           600                   <br>cluster.self-heal-window-size      1                    <br>cluster.data-change-log         on                    <br>cluster.metadata-change-log       on                    <br>cluster.data-self-heal-algorithm     (null)                  <br>cluster.eager-lock            on                    <br>disperse.eager-lock           on                    <br>disperse.other-eager-lock        on                    <br>disperse.eager-lock-timeout       1                    <br>disperse.other-eager-lock-timeout    1                    <br>cluster.quorum-type           none                   <br>cluster.quorum-count           (null)                  <br>cluster.choose-local           true                   <br>cluster.self-heal-readdir-size      1KB                   <br>cluster.post-op-delay-secs        1                    <br>cluster.ensure-durability        on                    <br>cluster.consistent-metadata       no                    <br>cluster.heal-wait-queue-length      128                   <br>cluster.favorite-child-policy      none                   <br>cluster.full-lock            yes                   <br>cluster.stripe-block-size        128KB                  <br>cluster.stripe-coalesce         true                   <br>diagnostics.latency-measurement     off                   <br>diagnostics.dump-fd-stats        off                   <br>diagnostics.count-fop-hits        off                   <br>diagnostics.brick-log-level       INFO                   <br>diagnostics.client-log-level       INFO                   <br>diagnostics.brick-sys-log-level     CRITICAL                 <br>diagnostics.client-sys-log-level     CRITICAL                 <br>diagnostics.brick-logger         (null)                  <br>diagnostics.client-logger        (null)                  <br>diagnostics.brick-log-format       (null)                  <br>diagnostics.client-log-format      (null)                  <br>diagnostics.brick-log-buf-size      5                    <br>diagnostics.client-log-buf-size     5                    <br>diagnostics.brick-log-flush-timeout   120                   <br>diagnostics.client-log-flush-timeout   120                   <br>diagnostics.stats-dump-interval     0                    <br>diagnostics.fop-sample-interval     0                    <br>diagnostics.stats-dump-format      json                   <br>diagnostics.fop-sample-buf-size     65535                  <br>diagnostics.stats-dnscache-ttl-sec    86400                  <br>performance.cache-max-file-size     0                    <br>performance.cache-min-file-size     0                    <br>performance.cache-refresh-timeout    1                    <br>performance.cache-priority                            <br>performance.cache-size          32MB                   <br>performance.io-thread-count       16                    <br>performance.high-prio-threads      16                    <br>performance.normal-prio-threads     16                    <br>performance.low-prio-threads       16                    <br>performance.least-prio-threads      1                    <br>performance.enable-least-priority    on                    <br>performance.iot-watchdog-secs      (null)                  <br>performance.iot-cleanup-disconnected-reqsoff                   <br>performance.iot-pass-through       false                  <br>performance.io-cache-pass-through    false                  <br>performance.cache-size          128MB                  <br>performance.qr-cache-timeout       1                    <br>performance.cache-invalidation      false                  <br>performance.ctime-invalidation      false                  <br>performance.flush-behind         on                    <br>performance.nfs.flush-behind       on                    <br>performance.write-behind-window-size   1MB                   <br>performance.resync-failed-syncs-after-fsyncoff                   <br>performance.nfs.write-behind-window-size1MB                   <br>performance.strict-o-direct       off                   <br>performance.nfs.strict-o-direct     off                   <br>performance.strict-write-ordering    off                   <br>performance.nfs.strict-write-ordering  off                   <br>performance.write-behind-trickling-writeson                    <br>performance.aggregate-size        128KB                  <br>performance.nfs.write-behind-trickling-writeson                    <br>performance.lazy-open          yes                   <br>performance.read-after-open       yes                   <br>performance.open-behind-pass-through   false                  <br>performance.read-ahead-page-count    4                    <br>performance.read-ahead-pass-through   false                  <br>performance.readdir-ahead-pass-through  false                  <br>performance.md-cache-pass-through    false                  <br>performance.md-cache-timeout       1                    <br>performance.cache-swift-metadata     true                   <br>performance.cache-samba-metadata     false                  <br>performance.cache-capability-xattrs   true                   <br>performance.cache-ima-xattrs       true                   <br>performance.md-cache-statfs       off                   <br>performance.xattr-cache-list                           <br>performance.nl-cache-pass-through    false                  <br>features.encryption           off                   <br>encryption.master-key          (null)                  <br>encryption.data-key-size         256                   <br>encryption.block-size          4096                   <br>network.frame-timeout          1800                   <br>network.ping-timeout           42                    <br>network.tcp-window-size         (null)                  <br>network.remote-dio            disable                 <br>client.event-threads           2                    <br>client.tcp-user-timeout         0                    <br>client.keepalive-time          20                    <br>client.keepalive-interval        2                    <br>client.keepalive-count          9                    <br>network.tcp-window-size         (null)                  <br>network.inode-lru-limit         16384                  <br>auth.allow                *                    <br>auth.reject               (null)                  <br>transport.keepalive           1                    <br>server.allow-insecure          on                    <br>server.root-squash            off                   <br>server.anonuid              65534                  <br>server.anongid              65534                  <br>server.statedump-path          /var/run/gluster             <br>server.outstanding-rpc-limit       64                    <br>server.ssl                (null)                  <br>auth.ssl-allow              *                    <br>server.manage-gids            off                   <br>server.dynamic-auth           on                    <br>client.send-gids             on                    <br>server.gid-timeout            300                   <br>server.own-thread            (null)                  <br>server.event-threads           1                    <br>server.tcp-user-timeout         0                    <br>server.keepalive-time          20                    <br>server.keepalive-interval        2                    <br>server.keepalive-count          9                    <br>transport.listen-backlog         1024                   <br>ssl.own-cert               (null)                  <br>ssl.private-key             (null)                  <br>ssl.ca-list               (null)                  <br>ssl.crl-path               (null)                  <br>ssl.certificate-depth          (null)                  <br>ssl.cipher-list             (null)                  <br>ssl.dh-param               (null)                  <br>ssl.ec-curve               (null)                  <br>transport.address-family         inet                   <br>performance.write-behind         on                    <br>performance.read-ahead          on                    <br>performance.readdir-ahead        on                    <br>performance.io-cache           on                    <br>performance.quick-read          on                    <br>performance.open-behind         on                    <br>performance.nl-cache           off                   <br>performance.stat-prefetch        on                    <br>performance.client-io-threads      off                   <br>performance.nfs.write-behind       on                    <br>performance.nfs.read-ahead        off                   <br>performance.nfs.io-cache         off                   <br>performance.nfs.quick-read        off                   <br>performance.nfs.stat-prefetch      off                   <br>performance.nfs.io-threads        off                   <br>performance.force-readdirp        true                   <br>performance.cache-invalidation      false                  <br>features.uss               off                   <br>features.snapshot-directory       .snaps                  <br>features.show-snapshot-directory     off                   <br>features.tag-namespaces         off                   <br>network.compression           off                   <br>network.compression.window-size     -15                   <br>network.compression.mem-level      8                    <br>network.compression.min-size       0                    <br>network.compression.compression-level  -1                    <br>network.compression.debug        false                  <br>features.default-soft-limit       80%                   <br>features.soft-timeout          60                    <br>features.hard-timeout          5                    <br>features.alert-time           86400                  <br>features.quota-deem-statfs        off                   <br>geo-replication.indexing         off                   <br>geo-replication.indexing         off                   <br>geo-replication.ignore-pid-check     off                   <br>geo-replication.ignore-pid-check     off                   <br>features.quota              off                   <br>features.inode-quota           off                   <br>features.bitrot             disable                 <br>debug.trace               off                   <br>debug.log-history            no                    <br>debug.log-file              no                    <br>debug.exclude-ops            (null)                  <br>debug.include-ops            (null)                  <br>debug.error-gen             off                   <br>debug.error-failure           (null)                  <br>debug.error-number            (null)                  <br>debug.random-failure           off                   <br>debug.error-fops             (null)                  <br>nfs.disable               on                    <br>features.read-only            off                   <br>features.worm              off                   <br>features.worm-file-level         off                   <br>features.worm-files-deletable      on                    <br>features.default-retention-period    120                   <br>features.retention-mode         relax                  <br>features.auto-commit-period       180                   <br>storage.linux-aio            off                   <br>storage.batch-fsync-mode         reverse-fsync              <br>storage.batch-fsync-delay-usec      0                    <br>storage.owner-uid            -1                    <br>storage.owner-gid            -1                    <br>storage.node-uuid-pathinfo        off                   <br>storage.health-check-interval      30                    <br>storage.build-pgfid           off                   <br>storage.gfid2path            on                    <br>storage.gfid2path-separator       :                    <br>storage.reserve             1                    <br>storage.health-check-timeout       10                    <br>storage.fips-mode-rchecksum       off                   <br>storage.force-create-mode        0000                   <br>storage.force-directory-mode       0000                   <br>storage.create-mask           0777                   <br>storage.create-directory-mask      0777                   <br>storage.max-hardlinks          100                   <br>storage.ctime              off                   <br>storage.bd-aio              off                   <br>config.gfproxyd             off                   <br>cluster.server-quorum-type        off                   <br>cluster.server-quorum-ratio       0                    <br>changelog.changelog           off                   <br>changelog.changelog-dir         {{ brick.path }}/.glusterfs/changelogs  <br>changelog.encoding            ascii                  <br>changelog.rollover-time         15                    <br>changelog.fsync-interval         5                    <br>changelog.changelog-barrier-timeout   120                   <br>changelog.capture-del-path        off                   <br>features.barrier             disable                 <br>features.barrier-timeout         120                   <br>features.trash              off                   <br>features.trash-dir            .trashcan                <br>features.trash-eliminate-path      (null)                  <br>features.trash-max-filesize       5MB                   <br>features.trash-internal-op        off                   <br>cluster.enable-shared-storage      disable                 <br>cluster.write-freq-threshold       0                    <br>cluster.read-freq-threshold       0                    <br>cluster.tier-pause            off                   <br>cluster.tier-promote-frequency      120                   <br>cluster.tier-demote-frequency      3600                   <br>cluster.watermark-hi           90                    <br>cluster.watermark-low          75                    <br>cluster.tier-mode            cache                  <br>cluster.tier-max-promote-file-size    0                    <br>cluster.tier-max-mb           4000                   <br>cluster.tier-max-files          10000                  <br>cluster.tier-query-limit         100                   <br>cluster.tier-compact           on                    <br>cluster.tier-hot-compact-frequency    604800                  <br>cluster.tier-cold-compact-frequency   604800                  <br>features.ctr-enabled           off                   <br>features.record-counters         off                   <br>features.ctr-record-metadata-heat    off                   <br>features.ctr_link_consistency      off                   <br>features.ctr_lookupheal_link_timeout   300                   <br>features.ctr_lookupheal_inode_timeout  300                   <br>features.ctr-sql-db-cachesize      12500                  <br>features.ctr-sql-db-wal-autocheckpoint  25000                  <br>features.selinux             on                    <br>locks.trace               off                   <br>locks.mandatory-locking         off                   <br>cluster.disperse-self-heal-daemon    enable                  <br>cluster.quorum-reads           no                    <br>client.bind-insecure           (null)                  <br>features.shard              off                   <br>features.shard-block-size        64MB                   <br>features.shard-lru-limit         16384                  <br>features.shard-deletion-rate       100                   <br>features.scrub-throttle         lazy                   <br>features.scrub-freq           biweekly                 <br>features.scrub              false                  <br>features.expiry-time           120                   <br>features.cache-invalidation       off                   <br>features.cache-invalidation-timeout   60                    <br>features.leases             off                   <br>features.lease-lock-recall-timeout    60                    <br>disperse.background-heals        8                    <br>disperse.heal-wait-qlength        128                   <br>cluster.heal-timeout           600                   <br>dht.force-readdirp            on                    <br>disperse.read-policy           gfid-hash                <br>cluster.shd-max-threads         1                    <br>cluster.shd-wait-qlength         1024                   <br>cluster.locking-scheme          full                   <br>cluster.granular-entry-heal       no                    <br>features.locks-revocation-secs      0                    <br>features.locks-revocation-clear-all   false                  <br>features.locks-revocation-max-blocked  0                    <br>features.locks-monkey-unlocking     false                  <br>features.locks-notify-contention     no                    <br>features.locks-notify-contention-delay  5                    <br>disperse.shd-max-threads         1                    <br>disperse.shd-wait-qlength        1024                   <br>disperse.cpu-extensions         auto                   <br>disperse.self-heal-window-size      1                    <br>cluster.use-compound-fops        off                   <br>performance.parallel-readdir       off                   <br>performance.rda-request-size       131072                  <br>performance.rda-low-wmark        4096                   <br>performance.rda-high-wmark        128KB                  <br>performance.rda-cache-limit       10MB                   <br>performance.nl-cache-positive-entry   false                  <br>performance.nl-cache-limit        10MB                   <br>performance.nl-cache-timeout       60                    <br>cluster.brick-multiplex         off                   <br>cluster.max-bricks-per-process      0                    <br>disperse.optimistic-change-log      on                    <br>disperse.stripe-cache          4                    <br>cluster.halo-enabled           False                  <br>cluster.halo-shd-max-latency       99999                  <br>cluster.halo-nfsd-max-latency      5                    <br>cluster.halo-max-latency         5                    <br>cluster.halo-max-replicas        99999                  <br>cluster.halo-min-replicas        2                    <br>cluster.daemon-log-level         INFO                   <br>debug.delay-gen             off                   <br>delay-gen.delay-percentage        10%                   <br>delay-gen.delay-duration         100000                  <br>delay-gen.enable                                 <br>disperse.parallel-writes         on                    <br>features.sdfs              on                    <br>features.cloudsync            off                   <br>features.utime              off                   <br>ctime.noatime              on                    <br>feature.cloudsync-storetype       (null)                  <br></div><div><br></div><div>Thanks again.</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 25 Dec 2019 at 05:51, Strahil <<a href="mailto:hunter86_bg@yahoo.com">hunter86_bg@yahoo.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><p dir="ltr">Hi David,</p>
<p dir="ltr">On Dec 24, 2019 02:47, David Cunningham <<a href="mailto:dcunningham@voisonics.com" target="_blank">dcunningham@voisonics.com</a>> wrote:<br>
><br>
> Hello,<br>
><br>
> In testing we found that actually the GFS client having access to all 3 nodes made no difference to performance. Perhaps that's because the 3rd node that wasn't accessible from the client before was the arbiter node?<br>
It makes sense, as no data is being generated towards the arbiter.<br>
> Presumably we shouldn't have an arbiter node listed under backupvolfile-server when mounting the filesystem? Since it doesn't store all the data surely it can't be used to serve the data.</p>
<p dir="ltr">I have my arbiter defined as last backup and no issues so far. At least the admin can easily identify the bricks from the mount options.</p>
<p dir="ltr">> We did have direct-io-mode=disable already as well, so that wasn't a factor in the performance problems.</p>
<p dir="ltr">Have you checked if the client vedsion ia not too old.<br>
Also you can check the cluster's operation cersion:<br>
# gluster volume get all cluster.max-op-version<br>
# gluster volume get all cluster.op-version</p>
<p dir="ltr">Cluster's op version should be at max-op-version.</p>
<p dir="ltr">In my mind come 2Â options:<br>
A) Upgrade to latest GLUSTER v6 or even v7 ( I know it won't be easy) and then set the op version to highest possible.<br>
# gluster volume get all cluster.max-op-version<br>
# gluster volume get all cluster.op-version</p>
<p dir="ltr">B)Â Deploy a NFS Ganesha server and connect the client over NFS v4.2 (and control the parallel connections from Ganesha).</p>
<p dir="ltr">Can you provide your Gluster volume's options?<br>
'gluster volume get <VOLNAME>Â all'</p>
<p dir="ltr">> Thanks again for any advice.<br>
><br>
><br>
><br>
> On Mon, 23 Dec 2019 at 13:09, David Cunningham <<a href="mailto:dcunningham@voisonics.com" target="_blank">dcunningham@voisonics.com</a>> wrote:<br>
>><br>
>> Hi Strahil,<br>
>><br>
>> Thanks for that. We do have one backup server specified, but will add the second backup as well.<br>
>><br>
>><br>
>> On Sat, 21 Dec 2019 at 11:26, Strahil <<a href="mailto:hunter86_bg@yahoo.com" target="_blank">hunter86_bg@yahoo.com</a>> wrote:<br>
>>><br>
>>> Hi David,<br>
>>><br>
>>> Also consider using the mount option to specify backup server via 'backupvolfile-server=server2:server3' (you can define more but I don't thing replica volumes greater that 3 are usefull (maybe in some special cases).<br>
>>><br>
>>> In such way, when the primary is lost, your client can reach a backup one without disruption.<br>
>>><br>
>>> P.S.: Client may 'hang' - if the primary server got rebooted ungracefully - as the communication must timeout before FUSE addresses the next server. There is a special script for killing gluster processes in '/usr/share/gluster/scripts' which can be used for setting up a systemd service to do that for you on shutdown.<br>
>>><br>
>>> Best Regards,<br>
>>> Strahil Nikolov<br>
>>><br>
>>> On Dec 20, 2019 23:49, David Cunningham <<a href="mailto:dcunningham@voisonics.com" target="_blank">dcunningham@voisonics.com</a>> wrote:<br>
>>>><br>
>>>> Hi Stahil,<br>
>>>><br>
>>>> Ah, that is an important point. One of the nodes is not accessible from the client, and we assumed that it only needed to reach the GFS node that was mounted so didn't think anything of it.<br>
>>>><br>
>>>> We will try making all nodes accessible, as well as "direct-io-mode=disable".<br>
>>>><br>
>>>> Thank you.<br>
>>>><br>
>>>><br>
>>>> On Sat, 21 Dec 2019 at 10:29, Strahil Nikolov <<a href="mailto:hunter86_bg@yahoo.com" target="_blank">hunter86_bg@yahoo.com</a>> wrote:<br>
>>>>><br>
>>>>> Actually I haven't clarified myself.<br>
>>>>> FUSE mounts on the client side is connecting directly to all bricks consisted of the volume.<br>
>>>>> If for some reason (bad routing, firewall blocked) there could be cases where the client can reach 2 out of 3 bricks and this can constantly cause healing to happen (as one of the bricks is never updated) which will degrade the performance and cause excessive network usage.<br>
>>>>> As your attachment is from one of the gluster nodes, this could be the case.<br>
>>>>><br>
>>>>> Best Regards,<br>
>>>>> Strahil Nikolov<br>
>>>>><br>
>>>>> Ð’ петък, 20 декември 2019 г., 01:49:56 ч. Гринуич+2, David Cunningham <<a href="mailto:dcunningham@voisonics.com" target="_blank">dcunningham@voisonics.com</a>> напиÑа:<br>
>>>>><br>
>>>>><br>
>>>>> Hi Strahil,<br>
>>>>><br>
>>>>> The chart attached to my original email is taken from the GFS server.<br>
>>>>><br>
>>>>> I'm not sure what you mean by accessing all bricks simultaneously. We've mounted it from the client like this:<br>
>>>>> gfs1:/gvol0 /mnt/glusterfs/ glusterfs defaults,direct-io-mode=disable,_netdev,backupvolfile-server=gfs2,fetch-attempts=10 0 0<br>
>>>>><br>
>>>>> Should we do something different to access all bricks simultaneously?<br>
>>>>><br>
>>>>> Thanks for your help!<br>
>>>>><br>
>>>>><br>
>>>>> On Fri, 20 Dec 2019 at 11:47, Strahil Nikolov <<a href="mailto:hunter86_bg@yahoo.com" target="_blank">hunter86_bg@yahoo.com</a>> wrote:<br>
>>>>>><br>
>>>>>> I'm not sure if you did measure the traffic from client side (tcpdump on a client machine) or from Server side.<br>
>>>>>><br>
>>>>>> In both cases , please verify that the client accesses all bricks simultaneously, as this can cause unnecessary heals.<br>
>>>>>><br>
>>>>>> Have you thought about upgrading to v6? There are some enhancements in v6 which could be beneficial.<br>
>>>>>><br>
>>>>>> Yet, it is indeed strange that so much traffic is generated with FUSE.<br>
>>>>>><br>
>>>>>> Another aproach is to test with NFSGanesha which suports pNFS and can natively speak with Gluster, which cant bring you closer to the previous setup and also provide some extra performance.<br>
>>>>>><br>
>>>>>><br>
>>>>>> Best Regards,<br>
>>>>>> Strahil Nikolov<br>
>>>>>><br>
>>>>>><br>
>>>>>><br>
>><br>
>><br>
>> -- <br>
>> David Cunningham, Voisonics Limited<br>
>><a href="http://voisonics.com" target="_blank"> http://voisonics.com</a>/<br>
>> USA: +1 213 221 1092<br>
>> New Zealand: +64 (0)28 2558 3782<br>
><br>
><br>
><br>
> -- <br>
> David Cunningham, Voisonics Limited<br>
><a href="http://voisonics.com" target="_blank"> http://voisonics.com</a>/<br>
> USA: +1 213 221 1092<br>
> New Zealand: +64 (0)28 2558 3782</p>
<p dir="ltr">Best Regards,<br>
Strahil Nikolov</p>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>David Cunningham, Voisonics Limited<br><a href="http://voisonics.com/" target="_blank">http://voisonics.com/</a><br>USA: +1 213 221 1092<br>New Zealand: +64 (0)28 2558 3782</div></div></div></div></div></div></div></div></div></div></div>