<div dir="ltr">Hi,<div><br></div><div>it seems to be crashing inside open-behind. There's a known bug in that xlator that caused a crash. It has been fixed in 7.7, recently released. Can you try to upgrade ?</div><div><br></div><div>Xavi</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jul 24, 2020 at 8:50 AM <<a href="mailto:nico@furyweb.fr" target="_blank">nico@furyweb.fr</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div style="font-family:arial,helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)"><div>We're using gluster in a production environement, 3 nodes (2 data + 1 arbiter).</div><div>One of our VM gluster fuse client is regularly crashing on a particular volume, we recently upgraded all nodes and client to 7.6 but client is still crashing.</div><div><br></div><div>All cluster nodes & client are Debian stretch (9.12), gluster was installed from our local gluster apt repository mirror and op-version is set to 70200.</div><div><br></div><div>Volume contains a lot of files & directories but performance doesn't really matters, it seems to crash during this command :</div><div>find logscli -mtime +1 -type f | tar c -T - -f - --remove-files | tar xpf - -C /drbd<br></div><div><br></div><div>Volume was remonted this morning with DEBUG log level, waiting for next crash.</div><div><br></div><div>Volume attributes are </div><div><div><div>Option Value </div><div>------ ----- </div><div>cluster.lookup-unhashed on </div><div>cluster.lookup-optimize on </div><div>cluster.min-free-disk 10% </div><div>cluster.min-free-inodes 5% </div><div>cluster.rebalance-stats off </div><div>cluster.subvols-per-directory (null) </div><div>cluster.readdir-optimize off </div><div>cluster.rsync-hash-regex (null) </div><div>cluster.extra-hash-regex (null) </div><div>cluster.dht-xattr-name trusted.glusterfs.dht </div><div>cluster.randomize-hash-range-by-gfid off </div><div>cluster.rebal-throttle normal </div><div>cluster.lock-migration off </div><div>cluster.force-migration off </div><div>cluster.local-volume-name (null) </div><div>cluster.weighted-rebalance on </div><div>cluster.switch-pattern (null) </div><div>cluster.entry-change-log on </div><div>cluster.read-subvolume (null) </div><div>cluster.read-subvolume-index -1 </div><div>cluster.read-hash-mode 1 </div><div>cluster.background-self-heal-count 8 </div><div>cluster.metadata-self-heal off </div><div>cluster.data-self-heal off </div><div>cluster.entry-self-heal off </div><div>cluster.self-heal-daemon enable </div><div>cluster.heal-timeout 60 </div><div>cluster.self-heal-window-size 1 </div><div>cluster.data-change-log on </div><div>cluster.metadata-change-log on </div><div>cluster.data-self-heal-algorithm full </div><div>cluster.eager-lock on </div><div>disperse.eager-lock on </div><div>disperse.other-eager-lock on </div><div>disperse.eager-lock-timeout 1 </div><div>disperse.other-eager-lock-timeout 1 </div><div>cluster.quorum-type fixed </div><div>cluster.quorum-count 1 </div><div>cluster.choose-local true </div><div>cluster.self-heal-readdir-size 1KB </div><div>cluster.post-op-delay-secs 1 </div><div>cluster.ensure-durability on </div><div>cluster.consistent-metadata no </div><div>cluster.heal-wait-queue-length 128 </div><div>cluster.favorite-child-policy none </div><div>cluster.full-lock yes </div><div>cluster.optimistic-change-log on </div><div>diagnostics.latency-measurement off </div><div>diagnostics.dump-fd-stats off </div><div>diagnostics.count-fop-hits off </div><div>diagnostics.brick-log-level INFO </div><div>diagnostics.client-log-level ERROR </div><div>diagnostics.brick-sys-log-level CRITICAL </div><div>diagnostics.client-sys-log-level CRITICAL </div><div>diagnostics.brick-logger (null) </div><div>diagnostics.client-logger (null) </div><div>diagnostics.brick-log-format (null) </div><div>diagnostics.client-log-format (null) </div><div>diagnostics.brick-log-buf-size 5 </div><div>diagnostics.client-log-buf-size 5 </div><div>diagnostics.brick-log-flush-timeout 120 </div><div>diagnostics.client-log-flush-timeout 120 </div><div>diagnostics.stats-dump-interval 0 </div><div>diagnostics.fop-sample-interval 0 </div><div>diagnostics.stats-dump-format json </div><div>diagnostics.fop-sample-buf-size 65535 </div><div>diagnostics.stats-dnscache-ttl-sec 86400 </div><div>performance.cache-max-file-size 0 </div><div>performance.cache-min-file-size 0 </div><div>performance.cache-refresh-timeout 1 </div><div>performance.cache-priority </div><div>performance.cache-size 32MB </div><div>performance.io-thread-count 16 </div><div>performance.high-prio-threads 16 </div><div>performance.normal-prio-threads 16 </div><div>performance.low-prio-threads 16 </div><div>performance.least-prio-threads 1 </div><div>performance.enable-least-priority on </div><div>performance.iot-watchdog-secs (null) </div><div>performance.iot-cleanup-disconnected-reqsoff </div><div>performance.iot-pass-through false </div><div>performance.io-cache-pass-through false </div><div>performance.cache-size 128MB </div><div>performance.qr-cache-timeout 1 </div><div>performance.cache-invalidation false </div><div>performance.ctime-invalidation false </div><div>performance.flush-behind on </div><div>performance.nfs.flush-behind on </div><div>performance.write-behind-window-size 1MB </div><div>performance.resync-failed-syncs-after-fsyncoff </div><div>performance.nfs.write-behind-window-size1MB </div><div>performance.strict-o-direct off </div><div>performance.nfs.strict-o-direct off </div><div>performance.strict-write-ordering off </div><div>performance.nfs.strict-write-ordering off </div><div>performance.write-behind-trickling-writeson </div><div>performance.aggregate-size 128KB </div><div>performance.nfs.write-behind-trickling-writeson </div><div>performance.lazy-open yes </div><div>performance.read-after-open yes </div><div>performance.open-behind-pass-through false </div><div>performance.read-ahead-page-count 4 </div><div>performance.read-ahead-pass-through false </div><div>performance.readdir-ahead-pass-through false </div><div>performance.md-cache-pass-through false </div><div>performance.md-cache-timeout 1 </div><div>performance.cache-swift-metadata true </div><div>performance.cache-samba-metadata false </div><div>performance.cache-capability-xattrs true </div><div>performance.cache-ima-xattrs true </div><div>performance.md-cache-statfs off </div><div>performance.xattr-cache-list </div><div>performance.nl-cache-pass-through false </div><div>network.frame-timeout 1800 </div><div>network.ping-timeout 5 </div><div>network.tcp-window-size (null) </div><div>client.ssl on </div><div>network.remote-dio disable </div><div>client.event-threads 2 </div><div>client.tcp-user-timeout 0 </div><div>client.keepalive-time 20 </div><div>client.keepalive-interval 2 </div><div>client.keepalive-count 9 </div><div>network.tcp-window-size (null) </div><div>network.inode-lru-limit 16384 </div><div>auth.allow * </div><div>auth.reject (null) </div><div>transport.keepalive 1 </div><div>server.allow-insecure on </div><div>server.root-squash off </div><div>server.all-squash off </div><div>server.anonuid 65534 </div><div>server.anongid 65534 </div><div>server.statedump-path /var/run/gluster </div><div>server.outstanding-rpc-limit 64 </div><div>server.ssl on </div><div>auth.ssl-allow * </div><div>server.manage-gids off </div><div>server.dynamic-auth on </div><div>client.send-gids on </div><div>server.gid-timeout 300 </div><div>server.own-thread (null) </div><div>server.event-threads 2 </div><div>server.tcp-user-timeout 42 </div><div>server.keepalive-time 20 </div><div>server.keepalive-interval 2 </div><div>server.keepalive-count 9 </div><div>transport.listen-backlog 1024 </div><div>ssl.cipher-list HIGH:!SSLv2 </div><div>transport.address-family inet </div><div>performance.write-behind on </div><div>performance.read-ahead on </div><div>performance.readdir-ahead on </div><div>performance.io-cache on </div><div>performance.open-behind on </div><div>performance.quick-read on </div><div>performance.nl-cache off </div><div>performance.stat-prefetch on </div><div>performance.client-io-threads off </div><div>performance.nfs.write-behind on </div><div>performance.nfs.read-ahead off </div><div>performance.nfs.io-cache off </div><div>performance.nfs.quick-read off </div><div>performance.nfs.stat-prefetch off </div><div>performance.nfs.io-threads off </div><div>performance.force-readdirp true </div><div>performance.cache-invalidation false </div><div>performance.global-cache-invalidation true </div><div>features.uss off </div><div>features.snapshot-directory .snaps </div><div>features.show-snapshot-directory off </div><div>features.tag-namespaces off </div><div>network.compression off </div><div>network.compression.window-size -15 </div><div>network.compression.mem-level 8 </div><div>network.compression.min-size 0 </div><div>network.compression.compression-level -1 </div><div>network.compression.debug false </div><div>features.default-soft-limit 80% </div><div>features.soft-timeout 60 </div><div>features.hard-timeout 5 </div><div>features.alert-time 86400 </div><div>features.quota-deem-statfs off </div><div>geo-replication.indexing off </div><div>geo-replication.indexing off </div><div>geo-replication.ignore-pid-check off </div><div>geo-replication.ignore-pid-check off </div><div>features.quota off </div><div>features.inode-quota off </div><div>features.bitrot disable </div><div>debug.trace off </div><div>debug.log-history no </div><div>debug.log-file no </div><div>debug.exclude-ops (null) </div><div>debug.include-ops (null) </div><div>debug.error-gen off </div><div>debug.error-failure (null) </div><div>debug.error-number (null) </div><div>debug.random-failure off </div><div>debug.error-fops (null) </div><div>nfs.disable on </div><div>features.read-only off </div><div>features.worm off </div><div>features.worm-file-level off </div><div>features.worm-files-deletable on </div><div>features.default-retention-period 120 </div><div>features.retention-mode relax </div><div>features.auto-commit-period 180 </div><div>storage.linux-aio off </div><div>storage.batch-fsync-mode reverse-fsync </div><div>storage.batch-fsync-delay-usec 0 </div><div>storage.owner-uid -1 </div><div>storage.owner-gid -1 </div><div>storage.node-uuid-pathinfo off </div><div>storage.health-check-interval 30 </div><div>storage.build-pgfid off </div><div>storage.gfid2path on </div><div>storage.gfid2path-separator : </div><div>storage.reserve 1 </div><div>storage.reserve-size 0 </div><div>storage.health-check-timeout 10 </div><div>storage.fips-mode-rchecksum off </div><div>storage.force-create-mode 0000 </div><div>storage.force-directory-mode 0000 </div><div>storage.create-mask 0777 </div><div>storage.create-directory-mask 0777 </div><div>storage.max-hardlinks 100 </div><div>features.ctime off </div><div>config.gfproxyd off </div><div>cluster.server-quorum-type off </div><div>cluster.server-quorum-ratio 51 </div><div>changelog.changelog off </div><div>changelog.changelog-dir {{ brick.path }}/.glusterfs/changelogs </div><div>changelog.encoding ascii </div><div>changelog.rollover-time 15 </div><div>changelog.fsync-interval 5 </div><div>changelog.changelog-barrier-timeout 120 </div><div>changelog.capture-del-path off </div><div>features.barrier disable </div><div>features.barrier-timeout 120 </div><div>features.trash off </div><div>features.trash-dir .trashcan </div><div>features.trash-eliminate-path (null) </div><div>features.trash-max-filesize 5MB </div><div>features.trash-internal-op off </div><div>cluster.enable-shared-storage disable </div><div>locks.trace off </div><div>locks.mandatory-locking off </div><div>cluster.disperse-self-heal-daemon enable </div><div>cluster.quorum-reads false </div><div>client.bind-insecure (null) </div><div>features.timeout 45 </div><div>features.failover-hosts (null) </div><div>features.shard off </div><div>features.shard-block-size 64MB </div><div>features.shard-lru-limit 16384 </div><div>features.shard-deletion-rate 100 </div><div>features.scrub-throttle lazy </div><div>features.scrub-freq biweekly </div><div>features.scrub false </div><div>features.expiry-time 120 </div><div>features.cache-invalidation off </div><div>features.cache-invalidation-timeout 60 </div><div>features.leases off </div><div>features.lease-lock-recall-timeout 60 </div><div>disperse.background-heals 8 </div><div>disperse.heal-wait-qlength 128 </div><div>cluster.heal-timeout 60 </div><div>dht.force-readdirp on </div><div>disperse.read-policy gfid-hash </div><div>cluster.shd-max-threads 1 </div><div>cluster.shd-wait-qlength 1024 </div><div>cluster.locking-scheme full </div><div>cluster.granular-entry-heal no </div><div>features.locks-revocation-secs 0 </div><div>features.locks-revocation-clear-all false </div><div>features.locks-revocation-max-blocked 0 </div><div>features.locks-monkey-unlocking false </div><div>features.locks-notify-contention no </div><div>features.locks-notify-contention-delay 5 </div><div>disperse.shd-max-threads 1 </div><div>disperse.shd-wait-qlength 1024 </div><div>disperse.cpu-extensions auto </div><div>disperse.self-heal-window-size 1 </div><div>cluster.use-compound-fops off </div><div>performance.parallel-readdir off </div><div>performance.rda-request-size 131072 </div><div>performance.rda-low-wmark 4096 </div><div>performance.rda-high-wmark 128KB </div><div>performance.rda-cache-limit 10MB </div><div>performance.nl-cache-positive-entry false </div><div>performance.nl-cache-limit 10MB </div><div>performance.nl-cache-timeout 60 </div><div>cluster.brick-multiplex disable </div><div>glusterd.vol_count_per_thread 100 </div><div>cluster.max-bricks-per-process 250 </div><div>disperse.optimistic-change-log on </div><div>disperse.stripe-cache 4 </div><div>cluster.halo-enabled False </div><div>cluster.halo-shd-max-latency 99999 </div><div>cluster.halo-nfsd-max-latency 5 </div><div>cluster.halo-max-latency 5 </div><div>cluster.halo-max-replicas 99999 </div><div>cluster.halo-min-replicas 2 </div><div>features.selinux on </div><div>cluster.daemon-log-level INFO </div><div>debug.delay-gen off </div><div>delay-gen.delay-percentage 10% </div><div>delay-gen.delay-duration 100000 </div><div>delay-gen.enable </div><div>disperse.parallel-writes on </div><div>features.sdfs off </div><div>features.cloudsync off </div><div>features.ctime off </div><div>ctime.noatime on </div><div>features.cloudsync-storetype (null) </div><div>features.enforce-mandatory-lock off </div><div>config.global-threading off </div><div>config.client-threads 16 </div><div>config.brick-threads 16 </div><div>features.cloudsync-remote-read off </div><div>features.cloudsync-store-id (null) </div><div>features.cloudsync-product-id (null)</div></div><div><br></div><div><br></div><div>Crash log found in /var/log/glusterfs/partage-logscli.log</div><div>2020-07-23 02:34:36</div><div>configuration details:</div><div>argp 1</div><div>backtrace 1</div><div>dlfcn 1</div><div>libpthread 1</div><div>llistxattr 1</div><div>setfsid 1</div><div>spinlock 1</div><div>epoll.h 1</div><div>xattr.h 1</div><div>st_atim.tv_nsec 1</div><div>package-string: glusterfs 7.6</div><div>/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x25e50)[0x7fbe02138e50]</div><div>/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x2f7)[0x7fbe021434b7]</div><div>/lib/x86_64-linux-gnu/libc.so.6(+0x33060)[0x7fbe00b87060]</div><div>/lib/x86_64-linux-gnu/libpthread.so.0(pthread_mutex_lock+0x0)[0x7fbe01396b40]</div><div>/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/performance/open-behind.so(+0x32f5)[0x7fbdfa89b2f5]</div><div>/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/performance/open-behind.so(+0x3c52)[0x7fbdfa89bc52]</div><div>/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/performance/open-behind.so(+0x3dac)[0x7fbdfa89bdac]</div><div>/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/performance/open-behind.so(+0x3f73)[0x7fbdfa89bf73]</div><div>/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_unlink+0xbc)[0x7fbe021c401c]</div><div>/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/performance/md-cache.so(+0x4495)[0x7fbdfa46c495]</div><div>/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/debug/io-stats.so(+0x5f44)[0x7fbdfa23af44]</div><div>/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_unlink+0xbc)[0x7fbe021c401c]</div><div>/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x1154f)[0x7fbdff7dc54f]</div><div>/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x7775)[0x7fbdff7d2775]</div><div>/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x74c8)[0x7fbdff7d24c8]</div><div>/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x77be)[0x7fbdff7d27be]</div><div>/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x6ac3)[0x7fbdff7d1ac3]</div><div>/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x7188)[0x7fbdff7d2188]</div><div>/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x74e8)[0x7fbdff7d24e8]</div><div>/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x779e)[0x7fbdff7d279e]</div><div>/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x77e0)[0x7fbdff7d27e0]</div><div>/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x83f9)[0x7fbdff7d33f9]</div><div>/usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x21d3c)[0x7fbdff7ecd3c]</div><div>/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7fbe013944a4]</div><div>/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fbe00c3cd0f]</div></div></div></div>________<br>
<br>
<br>
<br>
Community Meeting Calendar:<br>
<br>
Schedule -<br>
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br>
Bridge: <a href="https://bluejeans.com/441850968" rel="noreferrer" target="_blank">https://bluejeans.com/441850968</a><br>
<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
</blockquote></div>