[Gluster-users] Reliability issues with Gluster 3.10 and shard

Benjamin Kingston ben at nexusnebula.net
Fri May 12 08:35:59 UTC 2017


Hello all,

I'm trying to take advantage of the shard xlator, however I've found it
causes a lot of issues that I hope is easily resolvable

1) large file operations work well (copy file from folder a to folder b
2) seek operations and list operations frequently fail (ls directory, read
bytes xyz at offset 235567)

Turning off the shard feature resolves this issue for new files created in
the volume. mounted using the gluster fuse mount

here's my volume settings, please let me know if there's some changes I can
make.

Option
Value
------
-----
cluster.lookup-unhashed
on
cluster.lookup-optimize
on
cluster.min-free-disk
10%
cluster.min-free-inodes
5%
cluster.rebalance-stats
off
cluster.subvols-per-directory
(null)
cluster.readdir-optimize
off
cluster.rsync-hash-regex
(null)
cluster.extra-hash-regex
(null)
cluster.dht-xattr-name
trusted.glusterfs.dht
cluster.randomize-hash-range-by-gfid
off
cluster.rebal-throttle
normal
cluster.lock-migration
off
cluster.local-volume-name
(null)
cluster.weighted-rebalance
on
cluster.switch-pattern
(null)
cluster.entry-change-log
on
cluster.read-subvolume
(null)
cluster.read-subvolume-index
-1
cluster.read-hash-mode
1
cluster.background-self-heal-count
8
cluster.metadata-self-heal
on
cluster.data-self-heal
on
cluster.entry-self-heal
on
cluster.self-heal-daemon
on
cluster.heal-timeout
600
cluster.self-heal-window-size
1
cluster.data-change-log
on
cluster.metadata-change-log
on
cluster.data-self-heal-algorithm
diff
cluster.eager-lock
enable
disperse.eager-lock
on
cluster.quorum-type
auto
cluster.quorum-count
(null)
cluster.choose-local
on
cluster.self-heal-readdir-size
1KB
cluster.post-op-delay-secs
1
cluster.ensure-durability
on
cluster.consistent-metadata
no
cluster.heal-wait-queue-length
128
cluster.favorite-child-policy
none
cluster.stripe-block-size
128KB
cluster.stripe-coalesce
true
diagnostics.latency-measurement
off
diagnostics.dump-fd-stats
off
diagnostics.count-fop-hits
off
diagnostics.brick-log-level
INFO
diagnostics.client-log-level
INFO
diagnostics.brick-sys-log-level
CRITICAL
diagnostics.client-sys-log-level
CRITICAL
diagnostics.brick-logger
(null)
diagnostics.client-logger
(null)
diagnostics.brick-log-format
(null)
diagnostics.client-log-format
(null)
diagnostics.brick-log-buf-size
5
diagnostics.client-log-buf-size
5
diagnostics.brick-log-flush-timeout
120
diagnostics.client-log-flush-timeout
120
diagnostics.stats-dump-interval
0
diagnostics.fop-sample-interval
0
diagnostics.fop-sample-buf-size
65535
diagnostics.stats-dnscache-ttl-sec
86400
performance.cache-max-file-size
0
performance.cache-min-file-size
0
performance.cache-refresh-timeout
1
performance.cache-priority

performance.cache-size
1GB
performance.io-thread-count
64
performance.high-prio-threads
16
performance.normal-prio-threads
16
performance.low-prio-threads
32
performance.least-prio-threads
1
performance.enable-least-priority
on
performance.cache-size
1GB
performance.flush-behind
on
performance.nfs.flush-behind
on
performance.write-behind-window-size
2GB
performance.resync-failed-syncs-after-fsyncoff

performance.nfs.write-behind-window-size1MB

performance.strict-o-direct
off
performance.nfs.strict-o-direct
off
performance.strict-write-ordering
off
performance.nfs.strict-write-ordering
off
performance.lazy-open
yes
performance.read-after-open
no
performance.read-ahead-page-count
4
performance.md-cache-timeout
1
performance.cache-swift-metadata
true
performance.cache-samba-metadata
false
performance.cache-capability-xattrs
true
performance.cache-ima-xattrs
on
features.encryption
off
encryption.master-key
(null)
encryption.data-key-size
256
encryption.block-size
4096
network.frame-timeout
1800
network.ping-timeout
42
network.tcp-window-size
(null)
features.lock-heal
off
features.grace-timeout
10
network.remote-dio
disable
client.event-threads
3
network.ping-timeout
42
network.tcp-window-size
(null)
network.inode-lru-limit
90000
auth.allow
*
auth.reject
(null)
transport.keepalive
on
server.allow-insecure
on
server.root-squash
off
server.anonuid
65534
server.anongid
65534
server.statedump-path
/var/run/gluster
server.outstanding-rpc-limit
64
features.lock-heal
off
features.grace-timeout
10
server.ssl
(null)
auth.ssl-allow
*
server.manage-gids
off
server.dynamic-auth
on
client.send-gids
on
server.gid-timeout
300
server.own-thread
(null)
server.event-threads
3
ssl.own-cert
(null)
ssl.private-key
(null)
ssl.ca-list
(null)
ssl.crl-path
(null)
ssl.certificate-depth
(null)
ssl.cipher-list
(null)
ssl.dh-param
(null)
ssl.ec-curve
(null)
transport.address-family
inet6
performance.write-behind
on
performance.read-ahead
off
performance.readdir-ahead
on
performance.io-cache
on
performance.quick-read
off
performance.open-behind
on
performance.stat-prefetch
on
performance.client-io-threads
on
performance.nfs.write-behind
on
performance.nfs.read-ahead
off
performance.nfs.io-cache
off
performance.nfs.quick-read
off
performance.nfs.stat-prefetch
off
performance.nfs.io-threads
off
performance.force-readdirp
true
performance.cache-invalidation
false
features.uss
off
features.snapshot-directory
.snaps
features.show-snapshot-directory
off
network.compression
off
network.compression.window-size
-15
network.compression.mem-level
8
network.compression.min-size
0
network.compression.compression-level
-1
network.compression.debug
false
features.limit-usage
(null)
features.quota-timeout
0
features.default-soft-limit
80%
features.soft-timeout
60
features.hard-timeout
5
features.alert-time
86400
features.quota-deem-statfs
off
geo-replication.indexing
off
geo-replication.indexing                off

-ben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170512/a12dc3fa/attachment.html>


More information about the Gluster-users mailing list