[Gluster-users] upgrade best practices

Jim Kinney jim.kinney at gmail.com
Mon Apr 1 16:15:10 UTC 2019


On Sun, 2019-03-31 at 23:01 +0530, Soumya Koduri wrote:
> On 3/29/19 10:39 PM, Poornima Gurusiddaiah wrote:
> > On Fri, Mar 29, 2019, 10:03 PM Jim Kinney <jim.kinney at gmail.com
> > <mailto:jim.kinney at gmail.com>> wrote:
> >     Currently running 3.12 on Centos 7.6. Doing cleanups on split-
> > brain    and out of sync, need heal files.
> >     We need to migrate the three replica servers to gluster v. 5 or
> > 6.    Also will need to upgrade about 80 clients as well. Given
> > that a    complete removal of gluster will not touch the 200+TB of
> > data on 12    volumes, we are looking at doing that process, Stop
> > all clients,    stop all glusterd services, remove all of it,
> > install new version,    setup new volumes from old bricks, install
> > new clients, mount    everything.
> >     We would like to get some better performance from nfs-ganesha
> > mounts    but that doesn't look like an option (not done any
> > parameter tweaks    in testing yet). At a bare minimum, we would
> > like to minimize the    total downtime of all systems.
> 
> Could you please be more specific here? As in are you looking for
> better performance during upgrade process or in general? Compared to
> 3.12, there are lot of perf improvements done in both glusterfs and
> esp., nfs-ganesha (latest stable - V2.7.x) stack. If you could
> provide more information about your workloads (for eg., large-
> file,small-files, metadata-intensive) , we can make some
> recommendations wrt to configuration.

Sure. More details:
We are (soon to be) running a three-node replica only gluster service
(2 nodes now, third is racked and ready for sync and being added to
gluster cluster). Each node has 2 external drive arrays plus one
internal. Each node has 40G IB plus 40G IP connections (plans to
upgrade to 100G). We currently have 9 volumes and each is 7TB up to
50TB of space. Each volume is a mix of thousands of large (>1GB) and
tens of thousands of small (~100KB) plus thousands inbetween.
Currently we have a 13-node computational cluster with varying GPU
abilities that mounts all of these volumes using gluster-fuse. Writes
are slow and reads are also as if from a single server. I have data
from a test setup (not anywhere near the capacity of the production
system - just for testing commands and recoveries) that indicates raw
NFS is much faster but no gluster, gluster-fuse is much slower. We have
mmap issues with python and fuse-mounted locations. Converting to  NFS
solves this. We have tinkered with kernel settings to handle oom-killer 
so it will no longer drop glusterfs when an errant job eat all the ram
(set oom_score_adj - -1000 for  all glusterfs pids).
We would like to transition (smoothly!!) to gluster 5 or 6 with nfs-
ganesha 2.7 and see some performance improvements. We will be using
corosync and pacemaker for NFS failover. It would be fantastic be able
to saturate a 10G IPoIB (or 40G IB !) connection to each compute node
in the current computational cluster. Right now we absolutely can't get
much write speed ( copy a 6.2GB file from host to gluster storage took
1m 21s. cp from disk to /dev/null is 7s). cp from gluster to /dev/null
is 1.0m (same 6.2GB file). That's a 10Gbps IPoIB connection at only
800Mbps.
We would like to do things like enable SSL encryption of all data flows
(we deal with PHI data in a HIPAA-regulated setting) but are concerned
about performance. We are running dual Intel Xeon  E5-2630L (12
physical cores each @ 2.4GHz) and 128GB RAM in each server node. We
have 170 users. About 20 are active at any time.
The current setting on /home (others are similar if not identical,
maybe nfs-disable is true for others):
gluster volume get home
allOption                                  Value                       
            ------                                  --
---                                   cluster.lookup-
unhashed                 on                                      cluste
r.lookup-
optimize                 off                                     cluste
r.min-free-
disk                   10%                                     cluster.
min-free-
inodes                 5%                                      cluster.
rebalance-
stats                 off                                     cluster.s
ubvols-per-
directory           (null)                                  cluster.rea
ddir-
optimize                off                                     cluster
.rsync-hash-
regex                (null)                                  cluster.ex
tra-hash-
regex                (null)                                  cluster.dh
t-xattr-
name                  trusted.glusterfs.dht                   cluster.r
andomize-hash-range-by-
gfid    off                                     cluster.rebal-
throttle                  normal                                  clust
er.lock-
migration                  off                                     clus
ter.local-volume-
name               (null)                                  cluster.weig
hted-
rebalance              on                                      cluster.
switch-
pattern                  (null)                                  cluste
r.entry-change-
log                on                                      cluster.read
-subvolume                  (null)                                  clu
ster.read-subvolume-index            -
1                                      cluster.read-hash-
mode                  1                                       cluster.b
ackground-self-heal-
count      8                                       cluster.metadata-
self-
heal              on                                      cluster.data-
self-
heal                  on                                      cluster.e
ntry-self-
heal                 on                                      cluster.se
lf-heal-
daemon                enable                                  cluster.h
eal-
timeout                    600                                     clus
ter.self-heal-window-
size           1                                       cluster.data-
change-
log                 on                                      cluster.met
adata-change-
log             on                                      cluster.data-
self-heal-
algorithm        (null)                                  cluster.eager-
lock                      on                                      dispe
rse.eager-
lock                     on                                      cluste
r.quorum-
type                     none                                    cluste
r.quorum-
count                    (null)                                  cluste
r.choose-
local                    true                                    cluste
r.self-heal-readdir-
size          1KB                                     cluster.post-op-
delay-
secs              1                                       cluster.ensur
e-
durability               on                                      cluste
r.consistent-
metadata             no                                      cluster.he
al-wait-queue-
length          128                                     cluster.favorit
e-child-
policy           none                                    cluster.stripe
-block-
size               128KB                                   cluster.stri
pe-
coalesce                 true                                    diagno
stics.latency-
measurement         off                                     diagnostics
.dump-fd-
stats               off                                     diagnostics
.count-fop-
hits              off                                     diagnostics.b
rick-log-
level             INFO                                    diagnostics.c
lient-log-
level            INFO                                    diagnostics.br
ick-sys-log-
level         CRITICAL                                diagnostics.clien
t-sys-log-
level        CRITICAL                                diagnostics.brick-
logger                (null)                                  diagnosti
cs.client-
logger               (null)                                  diagnostic
s.brick-log-
format            (null)                                  diagnostics.c
lient-log-
format           (null)                                  diagnostics.br
ick-log-buf-
size          5                                       diagnostics.clien
t-log-buf-
size         5                                       diagnostics.brick-
log-flush-
timeout     120                                     diagnostics.client-
log-flush-
timeout    120                                     diagnostics.stats-
dump-
interval         0                                       diagnostics.fo
p-sample-
interval         0                                       diagnostics.st
ats-dump-
format           json                                    diagnostics.fo
p-sample-buf-
size         65535                                   diagnostics.stats-
dnscache-ttl-
sec      86400                                   performance.cache-max-
file-
size         0                                       performance.cache-
min-file-
size         0                                       performance.cache-
refresh-
timeout       1                                       performance.cache
-priority                                                      performa
nce.cache-
size                  32MB                                    performan
ce.io-thread-
count             16                                      performance.h
igh-prio-
threads           16                                      performance.n
ormal-prio-
threads         16                                      performance.low
-prio-
threads            16                                      performance.
least-prio-
threads          1                                       performance.en
able-least-
priority       on                                      performance.cach
e-
size                  128MB                                   performan
ce.flush-
behind                on                                      performan
ce.nfs.flush-
behind            on                                      performance.w
rite-behind-window-
size    1MB                                     performance.resync-
failed-syncs-after-
fsyncoff                                     performance.nfs.write-
behind-window-
size1MB                                     performance.strict-o-
direct             off                                     performance.
nfs.strict-o-
direct         off                                     performance.stri
ct-write-
ordering       off                                     performance.nfs.
strict-write-
ordering   off                                     performance.lazy-
open                   yes                                     performa
nce.read-after-
open             no                                      performance.re
ad-ahead-page-
count       4                                       performance.md-
cache-
timeout            1                                       performance.
cache-swift-
metadata        true                                    performance.cac
he-samba-
metadata        false                                   performance.cac
he-capability-
xattrs     true                                    performance.cache-
ima-
xattrs            true                                    features.encr
yption                     off                                     encr
yption.master-
key                   (null)                                  encryptio
n.data-key-
size                256                                     encryption.
block-
size                   4096                                    network.
frame-
timeout                   1800                                    netwo
rk.ping-
timeout                    42                                      netw
ork.tcp-window-
size                 (null)                                  features.l
ock-
heal                      off                                     featu
res.grace-
timeout                  10                                      networ
k.remote-
dio                      disable                                 client
.event-
threads                    2                                       clie
nt.tcp-user-
timeout                 0                                       client.
keepalive-
time                   20                                      client.k
eepalive-
interval               2                                       client.k
eepalive-
count                  9                                       network.
tcp-window-
size                 (null)                                  network.in
ode-lru-
limit                 16384                                   auth.allo
w                              *                                       
auth.reject                             (null)                         
         transport.keepalive                     1                     
                  server.allow-
insecure                   (null)                                  serv
er.root-
squash                      off                                     ser
ver.anonuid                          65534                             
      server.anongid                          65534                    
               server.statedump-
path                   /var/run/gluster                        server.o
utstanding-rpc-
limit            64                                      features.lock-
heal                      off                                     featu
res.grace-
timeout                  10                                      server
.ssl                              (null)                               
   auth.ssl-
allow                          *                                       
server.manage-
gids                      off                                     serve
r.dynamic-
auth                     on                                      client
.send-
gids                        on                                      ser
ver.gid-
timeout                      300                                     se
rver.own-
thread                       (null)                                  se
rver.event-
threads                    1                                       serv
er.tcp-user-
timeout                 0                                       server.
keepalive-
time                   20                                      server.k
eepalive-
interval               2                                       server.k
eepalive-
count                  9                                       transpor
t.listen-
backlog                10                                      ssl.own-
cert                            (null)                                 
 ssl.private-
key                         (null)                                  ssl
.ca-
list                             (null)                                
  ssl.crl-
path                            (null)                                 
 ssl.certificate-
depth                   (null)                                  ssl.cip
her-
list                         (null)                                  ss
l.dh-
param                            (null)                                
  ssl.ec-
curve                            (null)                                
  performance.write-
behind                on                                      performan
ce.read-
ahead                  on                                      performa
nce.readdir-
ahead               off                                     performance
.io-
cache                    on                                      perfor
mance.quick-
read                  on                                      performan
ce.open-
behind                 on                                      performa
nce.nl-
cache                    off                                     perfor
mance.stat-
prefetch               on                                      performa
nce.client-io-
threads           off                                     performance.n
fs.write-
behind            on                                      performance.n
fs.read-
ahead              off                                     performance.
nfs.io-
cache                off                                     performanc
e.nfs.quick-
read              off                                     performance.n
fs.stat-
prefetch           off                                     performance.
nfs.io-
threads              off                                     performanc
e.force-
readdirp              true                                    performan
ce.cache-
invalidation          false                                   features.
uss                            off                                     
features.snapshot-
directory             .snaps                                  features.
show-snapshot-
directory        off                                     network.compre
ssion                     off                                     netwo
rk.compression.window-size         -
15                                     network.compression.mem-
level           8                                       network.compres
sion.min-
size            0                                       network.compres
sion.compression-level   -
1                                      network.compression.debug       
        false                                   features.limit-
usage                    (null)                                  featur
es.default-soft-
limit             80%                                     features.soft
-timeout                   60                                      feat
ures.hard-
timeout                   5                                       featu
res.alert-
time                     86400                                   featur
es.quota-deem-
statfs              off                                     geo-
replication.indexing                off                                
     geo-
replication.indexing                off                                
     geo-replication.ignore-pid-
check        off                                     geo-
replication.ignore-pid-
check        off                                     features.quota    
                      off                                     features.
inode-
quota                    off                                     featur
es.bitrot                         disable                              
   debug.trace                             off                         
            debug.log-
history                       no                                      d
ebug.log-
file                          no                                      d
ebug.exclude-
ops                       (null)                                  debug
.include-
ops                       (null)                                  debug
.error-
gen                         off                                     deb
ug.error-
failure                     (null)                                  deb
ug.error-
number                      (null)                                  deb
ug.random-
failure                    off                                     debu
g.error-
fops                        (null)                                  nfs
.enable-
ino32                        no                                      nf
s.mem-
factor                          15                                     
 nfs.export-
dirs                         on                                      nf
s.export-
volumes                      on                                      nf
s.addr-
namelookup                     off                                     
nfs.dynamic-
volumes                     off                                     nfs
.register-with-
portmap               on                                      nfs.outst
anding-rpc-
limit               16                                      nfs.port   
                             2049                                    nf
s.rpc-auth-
unix                       on                                      nfs.
rpc-auth-
null                       on                                      nfs.
rpc-auth-
allow                      all                                     nfs.
rpc-auth-
reject                     none                                    nfs.
ports-
insecure                      off                                     n
fs.trusted-
sync                        off                                     nfs
.trusted-
write                       off                                     nfs
.volume-access                       read-
write                              nfs.export-
dir                                                                  nf
s.disable                             off                              
       nfs.nlm                                 on                      
                nfs.acl                                 on             
                         nfs.mount-
udp                           off                                     n
fs.mount-
rmtab                         /var/lib/glusterd/nfs/rmtab             n
fs.rpc-
statd                           /sbin/rpc.statd                        
 nfs.server-aux-
gids                     off                                     nfs.dr
c                                 off                                  
   nfs.drc-
size                            0x20000                                
 nfs.read-size                           (1 *
1048576ULL)                        nfs.write-
size                          (1 *
1048576ULL)                        nfs.readdir-
size                        (1 *
1048576ULL)                        nfs.rdirplus                        
    on                                      nfs.exports-auth-
enable                 (null)                                  nfs.auth
-refresh-interval-
sec           (null)                                  nfs.auth-cache-
ttl-
sec                  (null)                                  features.r
ead-
only                      off                                     featu
res.worm                           off                                 
    features.worm-file-
level                off                                     features.d
efault-retention-
period       120                                     features.retention
-mode                 relax                                   features.
auto-commit-
period             180                                     storage.linu
x-
aio                       off                                     stora
ge.batch-fsync-mode                reverse-
fsync                           storage.batch-fsync-delay-
usec          0                                       storage.owner-
uid                       -
1                                      storage.owner-
gid                       -
1                                      storage.node-uuid-
pathinfo              off                                     storage.h
ealth-check-
interval           30                                      storage.buil
d-
pgfid                     on                                      stora
ge.gfid2path                       on                                  
    storage.gfid2path-
separator             :                                       storage.b
d-
aio                          off                                     cl
uster.server-quorum-
type              off                                     cluster.serve
r-quorum-
ratio             0                                       changelog.cha
ngelog                     off                                     chan
gelog.changelog-
dir                 (null)                                  changelog.e
ncoding                      ascii                                   ch
angelog.rollover-
time                 15                                      changelog.
fsync-
interval                5                                       changel
og.changelog-barrier-
timeout     120                                     changelog.capture-
del-
path              off                                     features.barr
ier                        disable                                 feat
ures.barrier-
timeout                120                                     features
.trash                          off                                    
 features.trash-
dir                      .trashcan                               featur
es.trash-eliminate-
path           (null)                                  features.trash-
max-
filesize             5MB                                     features.t
rash-internal-
op              off                                     cluster.enable-
shared-
storage           disable                                 cluster.write
-freq-
threshold            0                                       cluster.re
ad-freq-
threshold             0                                       cluster.t
ier-
pause                      off                                     clus
ter.tier-promote-
frequency          120                                     cluster.tier
-demote-
frequency           3600                                    cluster.wat
ermark-
hi                    90                                      cluster.w
atermark-
low                   75                                      cluster.t
ier-
mode                       cache                                   clus
ter.tier-max-promote-file-
size      0                                       cluster.tier-max-
mb                     4000                                    cluster.
tier-max-
files                  10000                                   cluster.
tier-query-
limit                100                                     cluster.ti
er-
compact                    on                                      clus
ter.tier-hot-compact-
frequency      604800                                  cluster.tier-
cold-compact-
frequency     604800                                  features.ctr-
enabled                    off                                     feat
ures.record-
counters                off                                     feature
s.ctr-record-metadata-
heat       off                                     features.ctr_link_co
nsistency           off                                     features.ct
r_lookupheal_link_timeout    300                                     fe
atures.ctr_lookupheal_inode_timeout   300                              
       features.ctr-sql-db-
cachesize           12500                                   features.ct
r-sql-db-wal-
autocheckpoint  25000                                   features.selinu
x                        on                                      locks.
trace                             off                                  
   locks.mandatory-
locking                 off                                     cluster
.disperse-self-heal-
daemon       enable                                  cluster.quorum-
reads                    no                                      client
.bind-
insecure                    (null)                                  fea
tures.shard                          off                               
      features.shard-block-
size               64MB                                    features.scr
ub-
throttle                 lazy                                    featur
es.scrub-
freq                     biweekly                                featur
es.scrub                          false                                
   features.expiry-
time                    120                                     feature
s.cache-
invalidation             off                                     featur
es.cache-invalidation-
timeout     60                                      features.leases    
                     off                                     features.l
ease-lock-recall-
timeout      60                                      disperse.backgroun
d-
heals               8                                       disperse.he
al-wait-
qlength              128                                     cluster.he
al-
timeout                    600                                     dht.
force-
readdirp                      on                                      d
isperse.read-policy                    gfid-
hash                               cluster.shd-max-
threads                 1                                       cluster
.shd-wait-
qlength                1024                                    cluster.
locking-
scheme                  full                                    cluster
.granular-entry-
heal             no                                      features.locks
-revocation-
secs          0                                       features.locks-
revocation-clear-
all     false                                   features.locks-
revocation-max-
blocked   0                                       features.locks-
monkey-
unlocking         false                                   disperse.shd-
max-
threads                1                                       disperse
.shd-wait-
qlength               1024                                    disperse.
cpu-
extensions                 auto                                    disp
erse.self-heal-window-
size          1                                       cluster.use-
compound-
fops               off                                     performance.
parallel-
readdir            off                                     performance.
rda-request-
size            131072                                  performance.rda
-low-
wmark               4096                                    performance
.rda-high-
wmark              128KB                                   performance.
rda-cache-
limit             10MB                                    performance.n
l-cache-positive-
entry     false                                   performance.nl-cache-
limit              10MB                                    performance.
nl-cache-
timeout            60                                      cluster.bric
k-
multiplex                 off                                     clust
er.max-bricks-per-
process          0                                       disperse.optim
istic-change-
log          on                                      cluster.halo-
enabled                    False                                   clus
ter.halo-shd-max-
latency            99999                                   cluster.halo
-nfsd-max-
latency           5                                       cluster.halo-
max-
latency                5                                       cluster.
halo-max-replicas               
> Thanks,Soumya
> >     Does this process make more sense than a version upgrade
> > process to    4.1, then 5, then 6? What "gotcha's" do I need to be
> > ready for? I    have until late May to prep and test on old, slow
> > hardware with a    small amount of files and volumes.
> > 
> > You can directly upgrade from 3.12 to 6.x. I would suggest that
> > rather than deleting and creating Gluster volume. +Hari and +Sanju
> > for further guidelines on upgrade, as they recently did upgrade
> > tests. +Soumya to add to the nfs-ganesha aspect.
> > Regards,Poornima
> >     -- 
> >     James P. Kinney III
> >     Every time you stop a school, you will have to build a jail.
> > What you    gain at one end you lose at the other. It's like
> > feeding a dog on his    own tail. It won't fatten the dog.    -
> > Speech 11/23/1900 Mark Twain
> >     http://heretothereideas.blogspot.com/
> > 
> >     _______________________________________________    Gluster-
> > users mailing list    Gluster-users at gluster.org <mailto:Gluster-
> > users at gluster.org>    
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> > 
-- 
James P. Kinney III

Every time you stop a school, you will have to build a jail. What you
gain at one end you lose at the other. It's like feeding a dog on his
own tail. It won't fatten the dog.
- Speech 11/23/1900 Mark Twain

http://heretothereideas.blogspot.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190401/afd63681/attachment.html>


More information about the Gluster-users mailing list