[Gluster-users] fuse client 7.6 crashing regularly

nico at furyweb.fr nico at furyweb.fr
Fri Jul 24 06:49:53 UTC 2020


We're using gluster in a production environement, 3 nodes (2 data + 1 arbiter). 
One of our VM gluster fuse client is regularly crashing on a particular volume, we recently upgraded all nodes and client to 7.6 but client is still crashing. 

All cluster nodes & client are Debian stretch (9.12), gluster was installed from our local gluster apt repository mirror and op-version is set to 70200. 

Volume contains a lot of files & directories but performance doesn't really matters, it seems to crash during this command : 
find logscli -mtime +1 -type f | tar c -T - -f - --remove-files | tar xpf - -C /drbd 

Volume was remonted this morning with DEBUG log level, waiting for next crash. 

Volume attributes are 
Option Value 
------ ----- 
cluster.lookup-unhashed on 
cluster.lookup-optimize on 
cluster.min-free-disk 10% 
cluster.min-free-inodes 5% 
cluster.rebalance-stats off 
cluster.subvols-per-directory (null) 
cluster.readdir-optimize off 
cluster.rsync-hash-regex (null) 
cluster.extra-hash-regex (null) 
cluster.dht-xattr-name trusted.glusterfs.dht 
cluster.randomize-hash-range-by-gfid off 
cluster.rebal-throttle normal 
cluster.lock-migration off 
cluster.force-migration off 
cluster.local-volume-name (null) 
cluster.weighted-rebalance on 
cluster.switch-pattern (null) 
cluster.entry-change-log on 
cluster.read-subvolume (null) 
cluster.read-subvolume-index -1 
cluster.read-hash-mode 1 
cluster.background-self-heal-count 8 
cluster.metadata-self-heal off 
cluster.data-self-heal off 
cluster.entry-self-heal off 
cluster.self-heal-daemon enable 
cluster.heal-timeout 60 
cluster.self-heal-window-size 1 
cluster.data-change-log on 
cluster.metadata-change-log on 
cluster.data-self-heal-algorithm full 
cluster.eager-lock on 
disperse.eager-lock on 
disperse.other-eager-lock on 
disperse.eager-lock-timeout 1 
disperse.other-eager-lock-timeout 1 
cluster.quorum-type fixed 
cluster.quorum-count 1 
cluster.choose-local true 
cluster.self-heal-readdir-size 1KB 
cluster.post-op-delay-secs 1 
cluster.ensure-durability on 
cluster.consistent-metadata no 
cluster.heal-wait-queue-length 128 
cluster.favorite-child-policy none 
cluster.full-lock yes 
cluster.optimistic-change-log on 
diagnostics.latency-measurement off 
diagnostics.dump-fd-stats off 
diagnostics.count-fop-hits off 
diagnostics.brick-log-level INFO 
diagnostics.client-log-level ERROR 
diagnostics.brick-sys-log-level CRITICAL 
diagnostics.client-sys-log-level CRITICAL 
diagnostics.brick-logger (null) 
diagnostics.client-logger (null) 
diagnostics.brick-log-format (null) 
diagnostics.client-log-format (null) 
diagnostics.brick-log-buf-size 5 
diagnostics.client-log-buf-size 5 
diagnostics.brick-log-flush-timeout 120 
diagnostics.client-log-flush-timeout 120 
diagnostics.stats-dump-interval 0 
diagnostics.fop-sample-interval 0 
diagnostics.stats-dump-format json 
diagnostics.fop-sample-buf-size 65535 
diagnostics.stats-dnscache-ttl-sec 86400 
performance.cache-max-file-size 0 
performance.cache-min-file-size 0 
performance.cache-refresh-timeout 1 
performance.cache-priority 
performance.cache-size 32MB 
performance.io-thread-count 16 
performance.high-prio-threads 16 
performance.normal-prio-threads 16 
performance.low-prio-threads 16 
performance.least-prio-threads 1 
performance.enable-least-priority on 
performance.iot-watchdog-secs (null) 
performance.iot-cleanup-disconnected-reqsoff 
performance.iot-pass-through false 
performance.io-cache-pass-through false 
performance.cache-size 128MB 
performance.qr-cache-timeout 1 
performance.cache-invalidation false 
performance.ctime-invalidation false 
performance.flush-behind on 
performance.nfs.flush-behind on 
performance.write-behind-window-size 1MB 
performance.resync-failed-syncs-after-fsyncoff 
performance.nfs.write-behind-window-size1MB 
performance.strict-o-direct off 
performance.nfs.strict-o-direct off 
performance.strict-write-ordering off 
performance.nfs.strict-write-ordering off 
performance.write-behind-trickling-writeson 
performance.aggregate-size 128KB 
performance.nfs.write-behind-trickling-writeson 
performance.lazy-open yes 
performance.read-after-open yes 
performance.open-behind-pass-through false 
performance.read-ahead-page-count 4 
performance.read-ahead-pass-through false 
performance.readdir-ahead-pass-through false 
performance.md-cache-pass-through false 
performance.md-cache-timeout 1 
performance.cache-swift-metadata true 
performance.cache-samba-metadata false 
performance.cache-capability-xattrs true 
performance.cache-ima-xattrs true 
performance.md-cache-statfs off 
performance.xattr-cache-list 
performance.nl-cache-pass-through false 
network.frame-timeout 1800 
network.ping-timeout 5 
network.tcp-window-size (null) 
client.ssl on 
network.remote-dio disable 
client.event-threads 2 
client.tcp-user-timeout 0 
client.keepalive-time 20 
client.keepalive-interval 2 
client.keepalive-count 9 
network.tcp-window-size (null) 
network.inode-lru-limit 16384 
auth.allow * 
auth.reject (null) 
transport.keepalive 1 
server.allow-insecure on 
server.root-squash off 
server.all-squash off 
server.anonuid 65534 
server.anongid 65534 
server.statedump-path /var/run/gluster 
server.outstanding-rpc-limit 64 
server.ssl on 
auth.ssl-allow * 
server.manage-gids off 
server.dynamic-auth on 
client.send-gids on 
server.gid-timeout 300 
server.own-thread (null) 
server.event-threads 2 
server.tcp-user-timeout 42 
server.keepalive-time 20 
server.keepalive-interval 2 
server.keepalive-count 9 
transport.listen-backlog 1024 
ssl.cipher-list HIGH:!SSLv2 
transport.address-family inet 
performance.write-behind on 
performance.read-ahead on 
performance.readdir-ahead on 
performance.io-cache on 
performance.open-behind on 
performance.quick-read on 
performance.nl-cache off 
performance.stat-prefetch on 
performance.client-io-threads off 
performance.nfs.write-behind on 
performance.nfs.read-ahead off 
performance.nfs.io-cache off 
performance.nfs.quick-read off 
performance.nfs.stat-prefetch off 
performance.nfs.io-threads off 
performance.force-readdirp true 
performance.cache-invalidation false 
performance.global-cache-invalidation true 
features.uss off 
features.snapshot-directory .snaps 
features.show-snapshot-directory off 
features.tag-namespaces off 
network.compression off 
network.compression.window-size -15 
network.compression.mem-level 8 
network.compression.min-size 0 
network.compression.compression-level -1 
network.compression.debug false 
features.default-soft-limit 80% 
features.soft-timeout 60 
features.hard-timeout 5 
features.alert-time 86400 
features.quota-deem-statfs off 
geo-replication.indexing off 
geo-replication.indexing off 
geo-replication.ignore-pid-check off 
geo-replication.ignore-pid-check off 
features.quota off 
features.inode-quota off 
features.bitrot disable 
debug.trace off 
debug.log-history no 
debug.log-file no 
debug.exclude-ops (null) 
debug.include-ops (null) 
debug.error-gen off 
debug.error-failure (null) 
debug.error-number (null) 
debug.random-failure off 
debug.error-fops (null) 
nfs.disable on 
features.read-only off 
features.worm off 
features.worm-file-level off 
features.worm-files-deletable on 
features.default-retention-period 120 
features.retention-mode relax 
features.auto-commit-period 180 
storage.linux-aio off 
storage.batch-fsync-mode reverse-fsync 
storage.batch-fsync-delay-usec 0 
storage.owner-uid -1 
storage.owner-gid -1 
storage.node-uuid-pathinfo off 
storage.health-check-interval 30 
storage.build-pgfid off 
storage.gfid2path on 
storage.gfid2path-separator : 
storage.reserve 1 
storage.reserve-size 0 
storage.health-check-timeout 10 
storage.fips-mode-rchecksum off 
storage.force-create-mode 0000 
storage.force-directory-mode 0000 
storage.create-mask 0777 
storage.create-directory-mask 0777 
storage.max-hardlinks 100 
features.ctime off 
config.gfproxyd off 
cluster.server-quorum-type off 
cluster.server-quorum-ratio 51 
changelog.changelog off 
changelog.changelog-dir {{ brick.path }}/.glusterfs/changelogs 
changelog.encoding ascii 
changelog.rollover-time 15 
changelog.fsync-interval 5 
changelog.changelog-barrier-timeout 120 
changelog.capture-del-path off 
features.barrier disable 
features.barrier-timeout 120 
features.trash off 
features.trash-dir .trashcan 
features.trash-eliminate-path (null) 
features.trash-max-filesize 5MB 
features.trash-internal-op off 
cluster.enable-shared-storage disable 
locks.trace off 
locks.mandatory-locking off 
cluster.disperse-self-heal-daemon enable 
cluster.quorum-reads false 
client.bind-insecure (null) 
features.timeout 45 
features.failover-hosts (null) 
features.shard off 
features.shard-block-size 64MB 
features.shard-lru-limit 16384 
features.shard-deletion-rate 100 
features.scrub-throttle lazy 
features.scrub-freq biweekly 
features.scrub false 
features.expiry-time 120 
features.cache-invalidation off 
features.cache-invalidation-timeout 60 
features.leases off 
features.lease-lock-recall-timeout 60 
disperse.background-heals 8 
disperse.heal-wait-qlength 128 
cluster.heal-timeout 60 
dht.force-readdirp on 
disperse.read-policy gfid-hash 
cluster.shd-max-threads 1 
cluster.shd-wait-qlength 1024 
cluster.locking-scheme full 
cluster.granular-entry-heal no 
features.locks-revocation-secs 0 
features.locks-revocation-clear-all false 
features.locks-revocation-max-blocked 0 
features.locks-monkey-unlocking false 
features.locks-notify-contention no 
features.locks-notify-contention-delay 5 
disperse.shd-max-threads 1 
disperse.shd-wait-qlength 1024 
disperse.cpu-extensions auto 
disperse.self-heal-window-size 1 
cluster.use-compound-fops off 
performance.parallel-readdir off 
performance.rda-request-size 131072 
performance.rda-low-wmark 4096 
performance.rda-high-wmark 128KB 
performance.rda-cache-limit 10MB 
performance.nl-cache-positive-entry false 
performance.nl-cache-limit 10MB 
performance.nl-cache-timeout 60 
cluster.brick-multiplex disable 
glusterd.vol_count_per_thread 100 
cluster.max-bricks-per-process 250 
disperse.optimistic-change-log on 
disperse.stripe-cache 4 
cluster.halo-enabled False 
cluster.halo-shd-max-latency 99999 
cluster.halo-nfsd-max-latency 5 
cluster.halo-max-latency 5 
cluster.halo-max-replicas 99999 
cluster.halo-min-replicas 2 
features.selinux on 
cluster.daemon-log-level INFO 
debug.delay-gen off 
delay-gen.delay-percentage 10% 
delay-gen.delay-duration 100000 
delay-gen.enable 
disperse.parallel-writes on 
features.sdfs off 
features.cloudsync off 
features.ctime off 
ctime.noatime on 
features.cloudsync-storetype (null) 
features.enforce-mandatory-lock off 
config.global-threading off 
config.client-threads 16 
config.brick-threads 16 
features.cloudsync-remote-read off 
features.cloudsync-store-id (null) 
features.cloudsync-product-id (null) 


Crash log found in /var/log/glusterfs/partage-logscli.log 
2020-07-23 02:34:36 
configuration details: 
argp 1 
backtrace 1 
dlfcn 1 
libpthread 1 
llistxattr 1 
setfsid 1 
spinlock 1 
epoll.h 1 
xattr.h 1 
st_atim.tv_nsec 1 
package-string: glusterfs 7.6 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x25e50)[0x7fbe02138e50] 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x2f7)[0x7fbe021434b7] 
/lib/x86_64-linux-gnu/libc.so.6(+0x33060)[0x7fbe00b87060] 
/lib/x86_64-linux-gnu/libpthread.so.0(pthread_mutex_lock+0x0)[0x7fbe01396b40] 
/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/performance/open-behind.so(+0x32f5)[0x7fbdfa89b2f5] 
/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/performance/open-behind.so(+0x3c52)[0x7fbdfa89bc52] 
/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/performance/open-behind.so(+0x3dac)[0x7fbdfa89bdac] 
/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/performance/open-behind.so(+0x3f73)[0x7fbdfa89bf73] 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_unlink+0xbc)[0x7fbe021c401c] 
/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/performance/md-cache.so(+0x4495)[0x7fbdfa46c495] 
/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/debug/io-stats.so(+0x5f44)[0x7fbdfa23af44] 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_unlink+0xbc)[0x7fbe021c401c] 
/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x1154f)[0x7fbdff7dc54f] 
/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x7775)[0x7fbdff7d2775] 
/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x74c8)[0x7fbdff7d24c8] 
/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x77be)[0x7fbdff7d27be] 
/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x6ac3)[0x7fbdff7d1ac3] 
/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x7188)[0x7fbdff7d2188] 
/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x74e8)[0x7fbdff7d24e8] 
/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x779e)[0x7fbdff7d279e] 
/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x77e0)[0x7fbdff7d27e0] 
/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x83f9)[0x7fbdff7d33f9] 
/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x21d3c)[0x7fbdff7ecd3c] 
/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7fbe013944a4] 
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fbe00c3cd0f] 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200724/a4c7008e/attachment.html>


More information about the Gluster-users mailing list