[Gluster-users] fuse client 7.6 crashing regularly

Xavi Hernandez jahernan at redhat.com
Fri Jul 24 07:51:22 UTC 2020


Hi,

it seems to be crashing inside open-behind. There's a known bug in that
xlator that caused a crash. It has been fixed in 7.7, recently released.
Can you try to upgrade ?

Xavi

On Fri, Jul 24, 2020 at 8:50 AM <nico at furyweb.fr> wrote:

> We're using gluster in a production environement, 3 nodes (2 data + 1
> arbiter).
> One of our VM gluster fuse client is regularly crashing on a particular
> volume, we recently upgraded all nodes and client to 7.6 but client is
> still crashing.
>
> All cluster nodes & client are Debian stretch (9.12), gluster was
> installed from our local gluster apt repository mirror and op-version is
> set to 70200.
>
> Volume contains a lot of files & directories but performance doesn't
> really matters, it seems to crash during this command :
> find logscli -mtime +1 -type f | tar c -T - -f - --remove-files | tar xpf
> - -C /drbd
>
> Volume was remonted this morning with DEBUG log level, waiting for next
> crash.
>
> Volume attributes are
> Option                                  Value
>
> ------                                  -----
>
> cluster.lookup-unhashed                 on
>
> cluster.lookup-optimize                 on
>
> cluster.min-free-disk                   10%
>
> cluster.min-free-inodes                 5%
>
> cluster.rebalance-stats                 off
>
> cluster.subvols-per-directory           (null)
>
> cluster.readdir-optimize                off
>
> cluster.rsync-hash-regex                (null)
>
> cluster.extra-hash-regex                (null)
>
> cluster.dht-xattr-name                  trusted.glusterfs.dht
>
> cluster.randomize-hash-range-by-gfid    off
>
> cluster.rebal-throttle                  normal
>
> cluster.lock-migration                  off
>
> cluster.force-migration                 off
>
> cluster.local-volume-name               (null)
>
> cluster.weighted-rebalance              on
>
> cluster.switch-pattern                  (null)
>
> cluster.entry-change-log                on
>
> cluster.read-subvolume                  (null)
>
> cluster.read-subvolume-index            -1
>
> cluster.read-hash-mode                  1
>
> cluster.background-self-heal-count      8
>
> cluster.metadata-self-heal              off
>
> cluster.data-self-heal                  off
>
> cluster.entry-self-heal                 off
>
> cluster.self-heal-daemon                enable
>
> cluster.heal-timeout                    60
>
> cluster.self-heal-window-size           1
>
> cluster.data-change-log                 on
>
> cluster.metadata-change-log             on
>
> cluster.data-self-heal-algorithm        full
>
> cluster.eager-lock                      on
>
> disperse.eager-lock                     on
>
> disperse.other-eager-lock               on
>
> disperse.eager-lock-timeout             1
>
> disperse.other-eager-lock-timeout       1
>
> cluster.quorum-type                     fixed
>
> cluster.quorum-count                    1
>
> cluster.choose-local                    true
>
> cluster.self-heal-readdir-size          1KB
>
> cluster.post-op-delay-secs              1
>
> cluster.ensure-durability               on
>
> cluster.consistent-metadata             no
>
> cluster.heal-wait-queue-length          128
>
> cluster.favorite-child-policy           none
>
> cluster.full-lock                       yes
>
> cluster.optimistic-change-log           on
>
> diagnostics.latency-measurement         off
>
> diagnostics.dump-fd-stats               off
>
> diagnostics.count-fop-hits              off
>
> diagnostics.brick-log-level             INFO
>
> diagnostics.client-log-level            ERROR
>
> diagnostics.brick-sys-log-level         CRITICAL
>
> diagnostics.client-sys-log-level        CRITICAL
>
> diagnostics.brick-logger                (null)
>
> diagnostics.client-logger               (null)
>
> diagnostics.brick-log-format            (null)
>
> diagnostics.client-log-format           (null)
>
> diagnostics.brick-log-buf-size          5
>
> diagnostics.client-log-buf-size         5
>
> diagnostics.brick-log-flush-timeout     120
>
> diagnostics.client-log-flush-timeout    120
>
> diagnostics.stats-dump-interval         0
>
> diagnostics.fop-sample-interval         0
>
> diagnostics.stats-dump-format           json
>
> diagnostics.fop-sample-buf-size         65535
>
> diagnostics.stats-dnscache-ttl-sec      86400
>
> performance.cache-max-file-size         0
>
> performance.cache-min-file-size         0
>
> performance.cache-refresh-timeout       1
>
> performance.cache-priority
>
> performance.cache-size                  32MB
>
> performance.io-thread-count             16
>
> performance.high-prio-threads           16
>
> performance.normal-prio-threads         16
>
> performance.low-prio-threads            16
>
> performance.least-prio-threads          1
>
> performance.enable-least-priority       on
>
> performance.iot-watchdog-secs           (null)
>
> performance.iot-cleanup-disconnected-reqsoff
>
> performance.iot-pass-through            false
>
> performance.io-cache-pass-through       false
>
> performance.cache-size                  128MB
>
> performance.qr-cache-timeout            1
>
> performance.cache-invalidation          false
>
> performance.ctime-invalidation          false
>
> performance.flush-behind                on
>
> performance.nfs.flush-behind            on
>
> performance.write-behind-window-size    1MB
>
> performance.resync-failed-syncs-after-fsyncoff
>
> performance.nfs.write-behind-window-size1MB
>
> performance.strict-o-direct             off
>
> performance.nfs.strict-o-direct         off
>
> performance.strict-write-ordering       off
>
> performance.nfs.strict-write-ordering   off
>
> performance.write-behind-trickling-writeson
>
> performance.aggregate-size              128KB
>
> performance.nfs.write-behind-trickling-writeson
>
> performance.lazy-open                   yes
>
> performance.read-after-open             yes
>
> performance.open-behind-pass-through    false
>
> performance.read-ahead-page-count       4
>
> performance.read-ahead-pass-through     false
>
> performance.readdir-ahead-pass-through  false
>
> performance.md-cache-pass-through       false
>
> performance.md-cache-timeout            1
>
> performance.cache-swift-metadata        true
>
> performance.cache-samba-metadata        false
>
> performance.cache-capability-xattrs     true
>
> performance.cache-ima-xattrs            true
>
> performance.md-cache-statfs             off
>
> performance.xattr-cache-list
>
> performance.nl-cache-pass-through       false
>
> network.frame-timeout                   1800
>
> network.ping-timeout                    5
>
> network.tcp-window-size                 (null)
>
> client.ssl                              on
>
> network.remote-dio                      disable
>
> client.event-threads                    2
>
> client.tcp-user-timeout                 0
>
> client.keepalive-time                   20
>
> client.keepalive-interval               2
>
> client.keepalive-count                  9
>
> network.tcp-window-size                 (null)
>
> network.inode-lru-limit                 16384
>
> auth.allow                              *
>
> auth.reject                             (null)
>
> transport.keepalive                     1
>
> server.allow-insecure                   on
>
> server.root-squash                      off
>
> server.all-squash                       off
>
> server.anonuid                          65534
>
> server.anongid                          65534
>
> server.statedump-path                   /var/run/gluster
>
> server.outstanding-rpc-limit            64
>
> server.ssl                              on
>
> auth.ssl-allow                          *
>
> server.manage-gids                      off
>
> server.dynamic-auth                     on
>
> client.send-gids                        on
>
> server.gid-timeout                      300
>
> server.own-thread                       (null)
>
> server.event-threads                    2
>
> server.tcp-user-timeout                 42
>
> server.keepalive-time                   20
>
> server.keepalive-interval               2
>
> server.keepalive-count                  9
>
> transport.listen-backlog                1024
>
> ssl.cipher-list                         HIGH:!SSLv2
>
> transport.address-family                inet
>
> performance.write-behind                on
>
> performance.read-ahead                  on
>
> performance.readdir-ahead               on
>
> performance.io-cache                    on
>
> performance.open-behind                 on
>
> performance.quick-read                  on
>
> performance.nl-cache                    off
>
> performance.stat-prefetch               on
>
> performance.client-io-threads           off
>
> performance.nfs.write-behind            on
>
> performance.nfs.read-ahead              off
>
> performance.nfs.io-cache                off
>
> performance.nfs.quick-read              off
>
> performance.nfs.stat-prefetch           off
>
> performance.nfs.io-threads              off
>
> performance.force-readdirp              true
>
> performance.cache-invalidation          false
>
> performance.global-cache-invalidation   true
>
> features.uss                            off
>
> features.snapshot-directory             .snaps
>
> features.show-snapshot-directory        off
>
> features.tag-namespaces                 off
>
> network.compression                     off
>
> network.compression.window-size         -15
>
> network.compression.mem-level           8
>
> network.compression.min-size            0
>
> network.compression.compression-level   -1
>
> network.compression.debug               false
>
> features.default-soft-limit             80%
>
> features.soft-timeout                   60
>
> features.hard-timeout                   5
>
> features.alert-time                     86400
>
> features.quota-deem-statfs              off
>
> geo-replication.indexing                off
>
> geo-replication.indexing                off
>
> geo-replication.ignore-pid-check        off
>
> geo-replication.ignore-pid-check        off
>
> features.quota                          off
>
> features.inode-quota                    off
>
> features.bitrot                         disable
>
> debug.trace                             off
>
> debug.log-history                       no
>
> debug.log-file                          no
>
> debug.exclude-ops                       (null)
>
> debug.include-ops                       (null)
>
> debug.error-gen                         off
>
> debug.error-failure                     (null)
>
> debug.error-number                      (null)
>
> debug.random-failure                    off
>
> debug.error-fops                        (null)
>
> nfs.disable                             on
>
> features.read-only                      off
>
> features.worm                           off
>
> features.worm-file-level                off
>
> features.worm-files-deletable           on
>
> features.default-retention-period       120
>
> features.retention-mode                 relax
>
> features.auto-commit-period             180
>
> storage.linux-aio                       off
>
> storage.batch-fsync-mode                reverse-fsync
>
> storage.batch-fsync-delay-usec          0
>
> storage.owner-uid                       -1
>
> storage.owner-gid                       -1
>
> storage.node-uuid-pathinfo              off
>
> storage.health-check-interval           30
>
> storage.build-pgfid                     off
>
> storage.gfid2path                       on
>
> storage.gfid2path-separator             :
>
> storage.reserve                         1
>
> storage.reserve-size                    0
>
> storage.health-check-timeout            10
>
> storage.fips-mode-rchecksum             off
>
> storage.force-create-mode               0000
>
> storage.force-directory-mode            0000
>
> storage.create-mask                     0777
>
> storage.create-directory-mask           0777
>
> storage.max-hardlinks                   100
>
> features.ctime                          off
>
> config.gfproxyd                         off
>
> cluster.server-quorum-type              off
>
> cluster.server-quorum-ratio             51
>
> changelog.changelog                     off
>
> changelog.changelog-dir                 {{ brick.path
> }}/.glusterfs/changelogs
> changelog.encoding                      ascii
>
> changelog.rollover-time                 15
>
> changelog.fsync-interval                5
>
> changelog.changelog-barrier-timeout     120
>
> changelog.capture-del-path              off
>
> features.barrier                        disable
>
> features.barrier-timeout                120
>
> features.trash                          off
>
> features.trash-dir                      .trashcan
>
> features.trash-eliminate-path           (null)
>
> features.trash-max-filesize             5MB
>
> features.trash-internal-op              off
>
> cluster.enable-shared-storage           disable
>
> locks.trace                             off
>
> locks.mandatory-locking                 off
>
> cluster.disperse-self-heal-daemon       enable
>
> cluster.quorum-reads                    false
>
> client.bind-insecure                    (null)
>
> features.timeout                        45
>
> features.failover-hosts                 (null)
>
> features.shard                          off
>
> features.shard-block-size               64MB
>
> features.shard-lru-limit                16384
>
> features.shard-deletion-rate            100
>
> features.scrub-throttle                 lazy
>
> features.scrub-freq                     biweekly
>
> features.scrub                          false
>
> features.expiry-time                    120
>
> features.cache-invalidation             off
>
> features.cache-invalidation-timeout     60
>
> features.leases                         off
>
> features.lease-lock-recall-timeout      60
>
> disperse.background-heals               8
>
> disperse.heal-wait-qlength              128
>
> cluster.heal-timeout                    60
>
> dht.force-readdirp                      on
>
> disperse.read-policy                    gfid-hash
>
> cluster.shd-max-threads                 1
>
> cluster.shd-wait-qlength                1024
>
> cluster.locking-scheme                  full
>
> cluster.granular-entry-heal             no
>
> features.locks-revocation-secs          0
>
> features.locks-revocation-clear-all     false
>
> features.locks-revocation-max-blocked   0
>
> features.locks-monkey-unlocking         false
>
> features.locks-notify-contention        no
>
> features.locks-notify-contention-delay  5
>
> disperse.shd-max-threads                1
>
> disperse.shd-wait-qlength               1024
>
> disperse.cpu-extensions                 auto
>
> disperse.self-heal-window-size          1
>
> cluster.use-compound-fops               off
>
> performance.parallel-readdir            off
>
> performance.rda-request-size            131072
>
> performance.rda-low-wmark               4096
>
> performance.rda-high-wmark              128KB
>
> performance.rda-cache-limit             10MB
>
> performance.nl-cache-positive-entry     false
>
> performance.nl-cache-limit              10MB
>
> performance.nl-cache-timeout            60
>
> cluster.brick-multiplex                 disable
>
> glusterd.vol_count_per_thread           100
>
> cluster.max-bricks-per-process          250
>
> disperse.optimistic-change-log          on
>
> disperse.stripe-cache                   4
>
> cluster.halo-enabled                    False
>
> cluster.halo-shd-max-latency            99999
>
> cluster.halo-nfsd-max-latency           5
>
> cluster.halo-max-latency                5
>
> cluster.halo-max-replicas               99999
>
> cluster.halo-min-replicas               2
>
> features.selinux                        on
>
> cluster.daemon-log-level                INFO
>
> debug.delay-gen                         off
>
> delay-gen.delay-percentage              10%
>
> delay-gen.delay-duration                100000
>
> delay-gen.enable
>
> disperse.parallel-writes                on
>
> features.sdfs                           off
>
> features.cloudsync                      off
>
> features.ctime                          off
>
> ctime.noatime                           on
>
> features.cloudsync-storetype            (null)
>
> features.enforce-mandatory-lock         off
>
> config.global-threading                 off
>
> config.client-threads                   16
>
> config.brick-threads                    16
>
> features.cloudsync-remote-read          off
>
> features.cloudsync-store-id             (null)
>
> features.cloudsync-product-id           (null)
>
>
> Crash log found in /var/log/glusterfs/partage-logscli.log
> 2020-07-23 02:34:36
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 7.6
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x25e50)[0x7fbe02138e50]
>
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x2f7)[0x7fbe021434b7]
> /lib/x86_64-linux-gnu/libc.so.6(+0x33060)[0x7fbe00b87060]
>
> /lib/x86_64-linux-gnu/libpthread.so.0(pthread_mutex_lock+0x0)[0x7fbe01396b40]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/performance/open-behind.so(+0x32f5)[0x7fbdfa89b2f5]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/performance/open-behind.so(+0x3c52)[0x7fbdfa89bc52]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/performance/open-behind.so(+0x3dac)[0x7fbdfa89bdac]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/performance/open-behind.so(+0x3f73)[0x7fbdfa89bf73]
>
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_unlink+0xbc)[0x7fbe021c401c]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/performance/md-cache.so(+0x4495)[0x7fbdfa46c495]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/debug/io-stats.so(+0x5f44)[0x7fbdfa23af44]
>
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_unlink+0xbc)[0x7fbe021c401c]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x1154f)[0x7fbdff7dc54f]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x7775)[0x7fbdff7d2775]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x74c8)[0x7fbdff7d24c8]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x77be)[0x7fbdff7d27be]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x6ac3)[0x7fbdff7d1ac3]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x7188)[0x7fbdff7d2188]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x74e8)[0x7fbdff7d24e8]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x779e)[0x7fbdff7d279e]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x77e0)[0x7fbdff7d27e0]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x83f9)[0x7fbdff7d33f9]
>
> /usr/lib/x86_64-linux-gnu/glusterfs/7.6/xlator/mount/fuse.so(+0x21d3c)[0x7fbdff7ecd3c]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7fbe013944a4]
> /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fbe00c3cd0f]
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200724/7a367b90/attachment.html>


More information about the Gluster-users mailing list