<html dir="ltr"><head></head><body style="text-align:left; direction:ltr;"><div>On Sun, 2019-03-31 at 23:01 +0530, Soumya Koduri wrote:</div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><pre><br></pre><pre>On 3/29/19 10:39 PM, Poornima Gurusiddaiah wrote:</pre><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><pre><br></pre><pre><br></pre><pre>On Fri, Mar 29, 2019, 10:03 PM Jim Kinney <</pre><a href="mailto:jim.kinney@gmail.com"><pre>jim.kinney@gmail.com</pre></a><pre> </pre><pre><mailto:</pre><a href="mailto:jim.kinney@gmail.com"><pre>jim.kinney@gmail.com</pre></a><pre>>> wrote:</pre><pre><br></pre><pre> Currently running 3.12 on Centos 7.6. Doing cleanups on split-brain</pre><pre> and out of sync, need heal files.</pre><pre><br></pre><pre> We need to migrate the three replica servers to gluster v. 5 or 6.</pre><pre> Also will need to upgrade about 80 clients as well. Given that a</pre><pre> complete removal of gluster will not touch the 200+TB of data on 12</pre><pre> volumes, we are looking at doing that process, Stop all clients,</pre><pre> stop all glusterd services, remove all of it, install new version,</pre><pre> setup new volumes from old bricks, install new clients, mount</pre><pre> everything.</pre><pre><br></pre><pre> We would like to get some better performance from nfs-ganesha mounts</pre><pre> but that doesn't look like an option (not done any parameter tweaks</pre><pre> in testing yet). At a bare minimum, we would like to minimize the</pre><pre> total downtime of all systems.</pre></blockquote><pre><br></pre><pre>Could you please be more specific here? As in are you looking for better </pre><pre>performance during upgrade process or in general? Compared to 3.12, </pre><pre>there are lot of perf improvements done in both glusterfs and esp., </pre><pre>nfs-ganesha (latest stable - V2.7.x) stack. If you could provide more </pre><pre>information about your workloads (for eg., large-file,small-files, </pre><pre>metadata-intensive) , we can make some recommendations wrt to configuration.</pre></blockquote><div><br></div><div>Sure. More details:</div><div><br></div><div>We are (soon to be) running a three-node replica only gluster service (2 nodes now, third is racked and ready for sync and being added to gluster cluster). Each node has 2 external drive arrays plus one internal. Each node has 40G IB plus 40G IP connections (plans to upgrade to 100G). We currently have 9 volumes and each is 7TB up to 50TB of space. Each volume is a mix of thousands of large (>1GB) and tens of thousands of small (~100KB) plus thousands inbetween.</div><div><br></div><div>Currently we have a 13-node computational cluster with varying GPU abilities that mounts all of these volumes using gluster-fuse. Writes are slow and reads are also as if from a single server. I have data from a test setup (not anywhere near the capacity of the production system - just for testing commands and recoveries) that indicates raw NFS is much faster but no gluster, gluster-fuse is much slower. We have mmap issues with python and fuse-mounted locations. Converting to NFS solves this. We have tinkered with kernel settings to handle oom-killer so it will no longer drop glusterfs when an errant job eat all the ram (set oom_score_adj - -1000 for all glusterfs pids).</div><div><br></div><div>We would like to transition (smoothly!!) to gluster 5 or 6 with nfs-ganesha 2.7 and see some performance improvements. We will be using corosync and pacemaker for NFS failover. It would be fantastic be able to saturate a 10G IPoIB (or 40G IB !) connection to each compute node in the current computational cluster. Right now we absolutely can't get much write speed ( copy a 6.2GB file from host to gluster storage took 1m 21s. cp from disk to /dev/null is 7s). cp from gluster to /dev/null is 1.0m (same 6.2GB file). That's a 10Gbps IPoIB connection at only 800Mbps.</div><div><br></div><div>We would like to do things like enable SSL encryption of all data flows (we deal with PHI data in a HIPAA-regulated setting) but are concerned about performance. We are running dual Intel Xeon E5-2630L (12 physical cores each @ 2.4GHz) and 128GB RAM in each server node. We have 170 users. About 20 are active at any time.</div><div><br></div><div>The current setting on /home (others are similar if not identical, maybe nfs-disable is true for others):</div><div><br></div><div>gluster volume get home all</div><div>Option Value </div><div>------ ----- </div><div>cluster.lookup-unhashed on </div><div>cluster.lookup-optimize off </div><div>cluster.min-free-disk 10% </div><div>cluster.min-free-inodes 5% </div><div>cluster.rebalance-stats off </div><div>cluster.subvols-per-directory (null) </div><div>cluster.readdir-optimize off </div><div>cluster.rsync-hash-regex (null) </div><div>cluster.extra-hash-regex (null) </div><div>cluster.dht-xattr-name trusted.glusterfs.dht </div><div>cluster.randomize-hash-range-by-gfid off </div><div>cluster.rebal-throttle normal </div><div>cluster.lock-migration off </div><div>cluster.local-volume-name (null) </div><div>cluster.weighted-rebalance on </div><div>cluster.switch-pattern (null) </div><div>cluster.entry-change-log on </div><div>cluster.read-subvolume (null) </div><div>cluster.read-subvolume-index -1 </div><div>cluster.read-hash-mode 1 </div><div>cluster.background-self-heal-count 8 </div><div>cluster.metadata-self-heal on </div><div>cluster.data-self-heal on </div><div>cluster.entry-self-heal on </div><div>cluster.self-heal-daemon enable </div><div>cluster.heal-timeout 600 </div><div>cluster.self-heal-window-size 1 </div><div>cluster.data-change-log on </div><div>cluster.metadata-change-log on </div><div>cluster.data-self-heal-algorithm (null) </div><div>cluster.eager-lock on </div><div>disperse.eager-lock on </div><div>cluster.quorum-type none </div><div>cluster.quorum-count (null) </div><div>cluster.choose-local true </div><div>cluster.self-heal-readdir-size 1KB </div><div>cluster.post-op-delay-secs 1 </div><div>cluster.ensure-durability on </div><div>cluster.consistent-metadata no </div><div>cluster.heal-wait-queue-length 128 </div><div>cluster.favorite-child-policy none </div><div>cluster.stripe-block-size 128KB </div><div>cluster.stripe-coalesce true </div><div>diagnostics.latency-measurement off </div><div>diagnostics.dump-fd-stats off </div><div>diagnostics.count-fop-hits off </div><div>diagnostics.brick-log-level INFO </div><div>diagnostics.client-log-level INFO </div><div>diagnostics.brick-sys-log-level CRITICAL </div><div>diagnostics.client-sys-log-level CRITICAL </div><div>diagnostics.brick-logger (null) </div><div>diagnostics.client-logger (null) </div><div>diagnostics.brick-log-format (null) </div><div>diagnostics.client-log-format (null) </div><div>diagnostics.brick-log-buf-size 5 </div><div>diagnostics.client-log-buf-size 5 </div><div>diagnostics.brick-log-flush-timeout 120 </div><div>diagnostics.client-log-flush-timeout 120 </div><div>diagnostics.stats-dump-interval 0 </div><div>diagnostics.fop-sample-interval 0 </div><div>diagnostics.stats-dump-format json </div><div>diagnostics.fop-sample-buf-size 65535 </div><div>diagnostics.stats-dnscache-ttl-sec 86400 </div><div>performance.cache-max-file-size 0 </div><div>performance.cache-min-file-size 0 </div><div>performance.cache-refresh-timeout 1 </div><div>performance.cache-priority </div><div>performance.cache-size 32MB </div><div>performance.io-thread-count 16 </div><div>performance.high-prio-threads 16 </div><div>performance.normal-prio-threads 16 </div><div>performance.low-prio-threads 16 </div><div>performance.least-prio-threads 1 </div><div>performance.enable-least-priority on </div><div>performance.cache-size 128MB </div><div>performance.flush-behind on </div><div>performance.nfs.flush-behind on </div><div>performance.write-behind-window-size 1MB </div><div>performance.resync-failed-syncs-after-fsyncoff </div><div>performance.nfs.write-behind-window-size1MB </div><div>performance.strict-o-direct off </div><div>performance.nfs.strict-o-direct off </div><div>performance.strict-write-ordering off </div><div>performance.nfs.strict-write-ordering off </div><div>performance.lazy-open yes </div><div>performance.read-after-open no </div><div>performance.read-ahead-page-count 4 </div><div>performance.md-cache-timeout 1 </div><div>performance.cache-swift-metadata true </div><div>performance.cache-samba-metadata false </div><div>performance.cache-capability-xattrs true </div><div>performance.cache-ima-xattrs true </div><div>features.encryption off </div><div>encryption.master-key (null) </div><div>encryption.data-key-size 256 </div><div>encryption.block-size 4096 </div><div>network.frame-timeout 1800 </div><div>network.ping-timeout 42 </div><div>network.tcp-window-size (null) </div><div>features.lock-heal off </div><div>features.grace-timeout 10 </div><div>network.remote-dio disable </div><div>client.event-threads 2 </div><div>client.tcp-user-timeout 0 </div><div>client.keepalive-time 20 </div><div>client.keepalive-interval 2 </div><div>client.keepalive-count 9 </div><div>network.tcp-window-size (null) </div><div>network.inode-lru-limit 16384 </div><div>auth.allow * </div><div>auth.reject (null) </div><div>transport.keepalive 1 </div><div>server.allow-insecure (null) </div><div>server.root-squash off </div><div>server.anonuid 65534 </div><div>server.anongid 65534 </div><div>server.statedump-path /var/run/gluster </div><div>server.outstanding-rpc-limit 64 </div><div>features.lock-heal off </div><div>features.grace-timeout 10 </div><div>server.ssl (null) </div><div>auth.ssl-allow * </div><div>server.manage-gids off </div><div>server.dynamic-auth on </div><div>client.send-gids on </div><div>server.gid-timeout 300 </div><div>server.own-thread (null) </div><div>server.event-threads 1 </div><div>server.tcp-user-timeout 0 </div><div>server.keepalive-time 20 </div><div>server.keepalive-interval 2 </div><div>server.keepalive-count 9 </div><div>transport.listen-backlog 10 </div><div>ssl.own-cert (null) </div><div>ssl.private-key (null) </div><div>ssl.ca-list (null) </div><div>ssl.crl-path (null) </div><div>ssl.certificate-depth (null) </div><div>ssl.cipher-list (null) </div><div>ssl.dh-param (null) </div><div>ssl.ec-curve (null) </div><div>performance.write-behind on </div><div>performance.read-ahead on </div><div>performance.readdir-ahead off </div><div>performance.io-cache on </div><div>performance.quick-read on </div><div>performance.open-behind on </div><div>performance.nl-cache off </div><div>performance.stat-prefetch on </div><div>performance.client-io-threads off </div><div>performance.nfs.write-behind on </div><div>performance.nfs.read-ahead off </div><div>performance.nfs.io-cache off </div><div>performance.nfs.quick-read off </div><div>performance.nfs.stat-prefetch off </div><div>performance.nfs.io-threads off </div><div>performance.force-readdirp true </div><div>performance.cache-invalidation false </div><div>features.uss off </div><div>features.snapshot-directory .snaps </div><div>features.show-snapshot-directory off </div><div>network.compression off </div><div>network.compression.window-size -15 </div><div>network.compression.mem-level 8 </div><div>network.compression.min-size 0 </div><div>network.compression.compression-level -1 </div><div>network.compression.debug false </div><div>features.limit-usage (null) </div><div>features.default-soft-limit 80% </div><div>features.soft-timeout 60 </div><div>features.hard-timeout 5 </div><div>features.alert-time 86400 </div><div>features.quota-deem-statfs off </div><div>geo-replication.indexing off </div><div>geo-replication.indexing off </div><div>geo-replication.ignore-pid-check off </div><div>geo-replication.ignore-pid-check off </div><div>features.quota off </div><div>features.inode-quota off </div><div>features.bitrot disable </div><div>debug.trace off </div><div>debug.log-history no </div><div>debug.log-file no </div><div>debug.exclude-ops (null) </div><div>debug.include-ops (null) </div><div>debug.error-gen off </div><div>debug.error-failure (null) </div><div>debug.error-number (null) </div><div>debug.random-failure off </div><div>debug.error-fops (null) </div><div>nfs.enable-ino32 no </div><div>nfs.mem-factor 15 </div><div>nfs.export-dirs on </div><div>nfs.export-volumes on </div><div>nfs.addr-namelookup off </div><div>nfs.dynamic-volumes off </div><div>nfs.register-with-portmap on </div><div>nfs.outstanding-rpc-limit 16 </div><div>nfs.port 2049 </div><div>nfs.rpc-auth-unix on </div><div>nfs.rpc-auth-null on </div><div>nfs.rpc-auth-allow all </div><div>nfs.rpc-auth-reject none </div><div>nfs.ports-insecure off </div><div>nfs.trusted-sync off </div><div>nfs.trusted-write off </div><div>nfs.volume-access read-write </div><div>nfs.export-dir </div><div>nfs.disable off </div><div>nfs.nlm on </div><div>nfs.acl on </div><div>nfs.mount-udp off </div><div>nfs.mount-rmtab /var/lib/glusterd/nfs/rmtab </div><div>nfs.rpc-statd /sbin/rpc.statd </div><div>nfs.server-aux-gids off </div><div>nfs.drc off </div><div>nfs.drc-size 0x20000 </div><div>nfs.read-size (1 * 1048576ULL) </div><div>nfs.write-size (1 * 1048576ULL) </div><div>nfs.readdir-size (1 * 1048576ULL) </div><div>nfs.rdirplus on </div><div>nfs.exports-auth-enable (null) </div><div>nfs.auth-refresh-interval-sec (null) </div><div>nfs.auth-cache-ttl-sec (null) </div><div>features.read-only off </div><div>features.worm off </div><div>features.worm-file-level off </div><div>features.default-retention-period 120 </div><div>features.retention-mode relax </div><div>features.auto-commit-period 180 </div><div>storage.linux-aio off </div><div>storage.batch-fsync-mode reverse-fsync </div><div>storage.batch-fsync-delay-usec 0 </div><div>storage.owner-uid -1 </div><div>storage.owner-gid -1 </div><div>storage.node-uuid-pathinfo off </div><div>storage.health-check-interval 30 </div><div>storage.build-pgfid on </div><div>storage.gfid2path on </div><div>storage.gfid2path-separator : </div><div>storage.bd-aio off </div><div>cluster.server-quorum-type off </div><div>cluster.server-quorum-ratio 0 </div><div>changelog.changelog off </div><div>changelog.changelog-dir (null) </div><div>changelog.encoding ascii </div><div>changelog.rollover-time 15 </div><div>changelog.fsync-interval 5 </div><div>changelog.changelog-barrier-timeout 120 </div><div>changelog.capture-del-path off </div><div>features.barrier disable </div><div>features.barrier-timeout 120 </div><div>features.trash off </div><div>features.trash-dir .trashcan </div><div>features.trash-eliminate-path (null) </div><div>features.trash-max-filesize 5MB </div><div>features.trash-internal-op off </div><div>cluster.enable-shared-storage disable </div><div>cluster.write-freq-threshold 0 </div><div>cluster.read-freq-threshold 0 </div><div>cluster.tier-pause off </div><div>cluster.tier-promote-frequency 120 </div><div>cluster.tier-demote-frequency 3600 </div><div>cluster.watermark-hi 90 </div><div>cluster.watermark-low 75 </div><div>cluster.tier-mode cache </div><div>cluster.tier-max-promote-file-size 0 </div><div>cluster.tier-max-mb 4000 </div><div>cluster.tier-max-files 10000 </div><div>cluster.tier-query-limit 100 </div><div>cluster.tier-compact on </div><div>cluster.tier-hot-compact-frequency 604800 </div><div>cluster.tier-cold-compact-frequency 604800 </div><div>features.ctr-enabled off </div><div>features.record-counters off </div><div>features.ctr-record-metadata-heat off </div><div>features.ctr_link_consistency off </div><div>features.ctr_lookupheal_link_timeout 300 </div><div>features.ctr_lookupheal_inode_timeout 300 </div><div>features.ctr-sql-db-cachesize 12500 </div><div>features.ctr-sql-db-wal-autocheckpoint 25000 </div><div>features.selinux on </div><div>locks.trace off </div><div>locks.mandatory-locking off </div><div>cluster.disperse-self-heal-daemon enable </div><div>cluster.quorum-reads no </div><div>client.bind-insecure (null) </div><div>features.shard off </div><div>features.shard-block-size 64MB </div><div>features.scrub-throttle lazy </div><div>features.scrub-freq biweekly </div><div>features.scrub false </div><div>features.expiry-time 120 </div><div>features.cache-invalidation off </div><div>features.cache-invalidation-timeout 60 </div><div>features.leases off </div><div>features.lease-lock-recall-timeout 60 </div><div>disperse.background-heals 8 </div><div>disperse.heal-wait-qlength 128 </div><div>cluster.heal-timeout 600 </div><div>dht.force-readdirp on </div><div>disperse.read-policy gfid-hash </div><div>cluster.shd-max-threads 1 </div><div>cluster.shd-wait-qlength 1024 </div><div>cluster.locking-scheme full </div><div>cluster.granular-entry-heal no </div><div>features.locks-revocation-secs 0 </div><div>features.locks-revocation-clear-all false </div><div>features.locks-revocation-max-blocked 0 </div><div>features.locks-monkey-unlocking false </div><div>disperse.shd-max-threads 1 </div><div>disperse.shd-wait-qlength 1024 </div><div>disperse.cpu-extensions auto </div><div>disperse.self-heal-window-size 1 </div><div>cluster.use-compound-fops off </div><div>performance.parallel-readdir off </div><div>performance.rda-request-size 131072 </div><div>performance.rda-low-wmark 4096 </div><div>performance.rda-high-wmark 128KB </div><div>performance.rda-cache-limit 10MB </div><div>performance.nl-cache-positive-entry false </div><div>performance.nl-cache-limit 10MB </div><div>performance.nl-cache-timeout 60 </div><div>cluster.brick-multiplex off </div><div>cluster.max-bricks-per-process 0 </div><div>disperse.optimistic-change-log on </div><div>cluster.halo-enabled False </div><div>cluster.halo-shd-max-latency 99999 </div><div>cluster.halo-nfsd-max-latency 5 </div><div>cluster.halo-max-latency 5 </div><div>cluster.halo-max-replicas </div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><pre><br></pre><pre>Thanks,</pre><pre>Soumya</pre><pre><br></pre><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><pre><br></pre><pre> Does this process make more sense than a version upgrade process to</pre><pre> 4.1, then 5, then 6? What "gotcha's" do I need to be ready for? I</pre><pre> have until late May to prep and test on old, slow hardware with a</pre><pre> small amount of files and volumes.</pre><pre><br></pre><pre><br></pre><pre>You can directly upgrade from 3.12 to 6.x. I would suggest that rather </pre><pre>than deleting and creating Gluster volume. +Hari and +Sanju for further </pre><pre>guidelines on upgrade, as they recently did upgrade tests. +Soumya to </pre><pre>add to the nfs-ganesha aspect.</pre><pre><br></pre><pre>Regards,</pre><pre>Poornima</pre><pre><br></pre><pre> -- </pre><pre><br></pre><pre> James P. Kinney III</pre><pre><br></pre><pre> Every time you stop a school, you will have to build a jail. What you</pre><pre> gain at one end you lose at the other. It's like feeding a dog on his</pre><pre> own tail. It won't fatten the dog.</pre><pre> - Speech 11/23/1900 Mark Twain</pre><pre><br></pre><pre> </pre><a href="http://heretothereideas.blogspot.com/"><pre>http://heretothereideas.blogspot.com/</pre></a><pre><br></pre><pre><br></pre><pre> _______________________________________________</pre><pre> Gluster-users mailing list</pre><pre> </pre><a href="mailto:Gluster-users@gluster.org"><pre>Gluster-users@gluster.org</pre></a><pre> <mailto:</pre><a href="mailto:Gluster-users@gluster.org"><pre>Gluster-users@gluster.org</pre></a><pre>></pre><pre> </pre><a href="https://lists.gluster.org/mailman/listinfo/gluster-users"><pre>https://lists.gluster.org/mailman/listinfo/gluster-users</pre></a><pre><br></pre><pre><br></pre></blockquote></blockquote><div><span><pre><pre>-- <br></pre>James P. Kinney III
Every time you stop a school, you will have to build a jail. What you
gain at one end you lose at the other. It's like feeding a dog on his
own tail. It won't fatten the dog.
- Speech 11/23/1900 Mark Twain
http://heretothereideas.blogspot.com/
</pre></span></div></body></html>