[Gluster-users] Expected behaviour of hypervisor on Gluster node loss
Niklaus Hofer
niklaus.hofer at stepping-stone.ch
Wed Feb 1 08:21:32 UTC 2017
HI
On 2017-02-01 05:35, Vijay Bellur wrote:
> On Mon, Jan 30, 2017 at 6:25 AM, Niklaus Hofer
> <niklaus.hofer at stepping-stone.ch
> <mailto:niklaus.hofer at stepping-stone.ch>> wrote:
>
> Hi
>
> I have a question concerning the 'correct' behaviour of GlusterFS:
>
> We a nice Gluster setup up and running. Most things are working
> nicely. Our setup is as follows:
> - Storage is a 2+1 Gluster setup (2 replicating hosts + 1 arbiter)
> with a volume for virtual machines.
> - Two virtualisation hosts running libvirt / qemu / kvm.
>
>
> Are you using something like oVirt or proxmox for managing your
> virtualization cluster?
We are using our own fork of FOSS-Cloud [0]. FOSS-cloud uses libvirt
underneath. We are currently on Qemu 2.1.3 and libvirt 1.2.8.
> Now the question is, what is supposed to happen when we unplug one
> of the storage nodes (aka power outage in one of our data centers)?
> Initially we were hoping that the virtualisation hosts would
> automatically switch over to the second storage node and keep all
> VMs running.
>
> However, during our tests, we have found that this is not the case.
> Instead, when we unplug one of the storage nodes, the virtual
> machines run into all sorts of problems; being unable to read/write,
> crashing applications and even corrupting the filesystem. That is of
> course not acceptable.
>
> Reading the documentation again, we now think that we have
> misunderstood what we're supposed to be doing. To our understanding,
> what should happen is this:
> - If the virtualisation host is connected to the storage node which
> is still running:
> - everything is fine and the VM keeps running
> - If the virtualisation host was connected to the storage node
> which is now absent:
> - qemu is supposed to 'pause' / 'freeze' the VM
> - Virtualisation host waits for ping timeout
> - Virtualisation host switches over to the other storage node
> - qemu 'unpauses' the VMs
> - The VM is fully operational again
>
> Does my description match the 'optimal' GlusterFS behaviour?
>
> Can you provide more details about your gluster volume configuration and
> the options enabled on the volume?
We upgraded Gluster from 3.5.X to 3.8.8. After the upgrade we switched
from a two node replicating setup to two nodes + arbiter.
The volume hosting the virtual disks uses the settings from the 'virt'
group, except for 'features.shard' which we disabled. I attached the
exact list of options.
[0] http://www.foss-cloud.org/en/wiki/FOSS-Cloud
Greetings
Niklaus Hofer
--
stepping stone GmbH
Neufeldstrasse 9
CH-3012 Bern
Telefon: +41 31 332 53 63
www.stepping-stone.ch
niklaus.hofer at stepping-stone.ch
-------------- next part --------------
Option Value
------ -----
cluster.lookup-unhashed on
cluster.lookup-optimize off
cluster.min-free-disk 10%
cluster.min-free-inodes 5%
cluster.rebalance-stats off
cluster.subvols-per-directory (null)
cluster.readdir-optimize off
cluster.rsync-hash-regex (null)
cluster.extra-hash-regex (null)
cluster.dht-xattr-name trusted.glusterfs.dht
cluster.randomize-hash-range-by-gfid off
cluster.rebal-throttle normal
cluster.lock-migration off
cluster.local-volume-name (null)
cluster.weighted-rebalance on
cluster.switch-pattern (null)
cluster.entry-change-log on
cluster.read-subvolume (null)
cluster.read-subvolume-index -1
cluster.read-hash-mode 2
cluster.background-self-heal-count 8
cluster.metadata-self-heal on
cluster.data-self-heal on
cluster.entry-self-heal on
cluster.self-heal-daemon on
cluster.heal-timeout 600
cluster.self-heal-window-size 1
cluster.data-change-log on
cluster.metadata-change-log on
cluster.data-self-heal-algorithm full
cluster.eager-lock enable
disperse.eager-lock on
cluster.quorum-type auto
cluster.quorum-count (null)
cluster.choose-local true
cluster.self-heal-readdir-size 1KB
cluster.post-op-delay-secs 1
cluster.ensure-durability on
cluster.consistent-metadata no
cluster.heal-wait-queue-length 128
cluster.favorite-child-policy none
cluster.stripe-block-size 128KB
cluster.stripe-coalesce true
diagnostics.latency-measurement off
diagnostics.dump-fd-stats off
diagnostics.count-fop-hits off
diagnostics.brick-log-level INFO
diagnostics.client-log-level INFO
diagnostics.brick-sys-log-level CRITICAL
diagnostics.client-sys-log-level CRITICAL
diagnostics.brick-logger (null)
diagnostics.client-logger (null)
diagnostics.brick-log-format (null)
diagnostics.client-log-format (null)
diagnostics.brick-log-buf-size 5
diagnostics.client-log-buf-size 5
diagnostics.brick-log-flush-timeout 120
diagnostics.client-log-flush-timeout 120
diagnostics.stats-dump-interval 0
diagnostics.fop-sample-interval 0
diagnostics.fop-sample-buf-size 65535
diagnostics.stats-dnscache-ttl-sec 86400
performance.cache-max-file-size 0
performance.cache-min-file-size 0
performance.cache-refresh-timeout 1
performance.cache-priority
performance.cache-size 32MB
performance.io-thread-count 32
performance.high-prio-threads 16
performance.normal-prio-threads 16
performance.low-prio-threads 32
performance.least-prio-threads 1
performance.enable-least-priority on
performance.least-rate-limit 0
performance.cache-size 128MB
performance.flush-behind on
performance.nfs.flush-behind on
performance.write-behind-window-size 1MB
performance.resync-failed-syncs-after-fsyncoff
performance.nfs.write-behind-window-size1MB
performance.strict-o-direct off
performance.nfs.strict-o-direct off
performance.strict-write-ordering off
performance.nfs.strict-write-ordering off
performance.lazy-open yes
performance.read-after-open no
performance.read-ahead-page-count 4
performance.md-cache-timeout 1
performance.cache-swift-metadata true
features.encryption off
encryption.master-key (null)
encryption.data-key-size 256
encryption.block-size 4096
network.frame-timeout 1800
network.ping-timeout 15
network.tcp-window-size (null)
features.lock-heal off
features.grace-timeout 10
network.remote-dio enable
client.event-threads 2
network.ping-timeout 15
network.tcp-window-size (null)
network.inode-lru-limit 16384
auth.allow 10.1.120.11,10.1.120.12,10.1.120.16,10.1.120.13,10.1.120.14,10.1.120.15
auth.reject (null)
transport.keepalive (null)
server.allow-insecure On
server.root-squash off
server.anonuid 65534
server.anongid 65534
server.statedump-path /var/run/gluster
server.outstanding-rpc-limit 64
features.lock-heal off
features.grace-timeout 10
server.ssl (null)
auth.ssl-allow *
server.manage-gids off
server.dynamic-auth on
client.send-gids on
server.gid-timeout 300
server.own-thread (null)
server.event-threads 2
ssl.own-cert (null)
ssl.private-key (null)
ssl.ca-list (null)
ssl.crl-path (null)
ssl.certificate-depth (null)
ssl.cipher-list (null)
ssl.dh-param (null)
ssl.ec-curve (null)
performance.write-behind on
performance.read-ahead off
performance.readdir-ahead off
performance.io-cache off
performance.quick-read off
performance.open-behind on
performance.stat-prefetch off
performance.client-io-threads off
performance.nfs.write-behind on
performance.nfs.read-ahead off
performance.nfs.io-cache off
performance.nfs.quick-read off
performance.nfs.stat-prefetch off
performance.nfs.io-threads off
performance.force-readdirp true
features.uss off
features.snapshot-directory .snaps
features.show-snapshot-directory off
network.compression off
network.compression.window-size -15
network.compression.mem-level 8
network.compression.min-size 0
network.compression.compression-level -1
network.compression.debug false
features.limit-usage (null)
features.quota-timeout 0
features.default-soft-limit 80%
features.soft-timeout 60
features.hard-timeout 5
features.alert-time 86400
features.quota-deem-statfs off
geo-replication.indexing off
geo-replication.indexing off
geo-replication.ignore-pid-check off
geo-replication.ignore-pid-check off
features.quota off
features.inode-quota off
features.bitrot disable
debug.trace off
debug.log-history no
debug.log-file no
debug.exclude-ops (null)
debug.include-ops (null)
debug.error-gen off
debug.error-failure (null)
debug.error-number (null)
debug.random-failure off
debug.error-fops (null)
nfs.enable-ino32 no
nfs.mem-factor 15
nfs.export-dirs on
nfs.export-volumes on
nfs.addr-namelookup off
nfs.dynamic-volumes off
nfs.register-with-portmap on
nfs.outstanding-rpc-limit 16
nfs.port 2049
nfs.rpc-auth-unix on
nfs.rpc-auth-null on
nfs.rpc-auth-allow all
nfs.rpc-auth-reject none
nfs.ports-insecure off
nfs.trusted-sync off
nfs.trusted-write off
nfs.volume-access read-write
nfs.export-dir
nfs.disable On
nfs.nlm on
nfs.acl on
nfs.mount-udp off
nfs.mount-rmtab /var/lib/glusterd/nfs/rmtab
nfs.rpc-statd /sbin/rpc.statd
nfs.server-aux-gids off
nfs.drc off
nfs.drc-size 0x20000
nfs.read-size (1 * 1048576ULL)
nfs.write-size (1 * 1048576ULL)
nfs.readdir-size (1 * 1048576ULL)
nfs.rdirplus on
nfs.exports-auth-enable (null)
nfs.auth-refresh-interval-sec (null)
nfs.auth-cache-ttl-sec (null)
features.read-only off
features.worm off
features.worm-file-level off
features.default-retention-period 120
features.retention-mode relax
features.auto-commit-period 180
storage.linux-aio off
storage.batch-fsync-mode reverse-fsync
storage.batch-fsync-delay-usec 0
storage.owner-uid -1
storage.owner-gid -1
storage.node-uuid-pathinfo off
storage.health-check-interval 30
storage.build-pgfid off
storage.bd-aio off
cluster.server-quorum-type server
cluster.server-quorum-ratio 0
changelog.changelog off
changelog.changelog-dir (null)
changelog.encoding ascii
changelog.rollover-time 15
changelog.fsync-interval 5
changelog.changelog-barrier-timeout 120
changelog.capture-del-path off
features.barrier disable
features.barrier-timeout 120
features.trash off
features.trash-dir .trashcan
features.trash-eliminate-path (null)
features.trash-max-filesize 5MB
features.trash-internal-op off
cluster.enable-shared-storage disable
cluster.write-freq-threshold 0
cluster.read-freq-threshold 0
cluster.tier-pause off
cluster.tier-promote-frequency 120
cluster.tier-demote-frequency 3600
cluster.watermark-hi 90
cluster.watermark-low 75
cluster.tier-mode cache
cluster.tier-max-promote-file-size 0
cluster.tier-max-mb 4000
cluster.tier-max-files 10000
features.ctr-enabled off
features.record-counters off
features.ctr-record-metadata-heat off
features.ctr_link_consistency off
features.ctr_lookupheal_link_timeout 300
features.ctr_lookupheal_inode_timeout 300
features.ctr-sql-db-cachesize 1000
features.ctr-sql-db-wal-autocheckpoint 1000
locks.trace off
locks.mandatory-locking off
cluster.disperse-self-heal-daemon enable
cluster.quorum-reads no
client.bind-insecure (null)
ganesha.enable off
features.shard off
features.shard-block-size 4MB
features.scrub-throttle lazy
features.scrub-freq biweekly
features.scrub false
features.expiry-time 120
features.cache-invalidation off
features.cache-invalidation-timeout 60
features.leases off
features.lease-lock-recall-timeout 60
disperse.background-heals 8
disperse.heal-wait-qlength 128
cluster.heal-timeout 600
dht.force-readdirp on
disperse.read-policy round-robin
cluster.shd-max-threads 1
cluster.shd-wait-qlength 1024
cluster.locking-scheme full
cluster.granular-entry-heal no
More information about the Gluster-users
mailing list