[Bugs] [Bug 1747844] New: Rebalance doesn't work correctly if performance.parallel-readdir on and with some other specific options set
bugzilla at redhat.com
bugzilla at redhat.com
Mon Sep 2 03:48:15 UTC 2019
https://bugzilla.redhat.com/show_bug.cgi?id=1747844
Bug ID: 1747844
Summary: Rebalance doesn't work correctly if
performance.parallel-readdir on and with some other
specific options set
Product: GlusterFS
Version: 4.1
Hardware: x86_64
OS: Linux
Status: NEW
Component: distribute
Severity: urgent
Assignee: bugs at gluster.org
Reporter: Howard.Chen at infortrend.com
CC: bugs at gluster.org
Target Milestone: ---
Classification: Community
Created attachment 1610643
--> https://bugzilla.redhat.com/attachment.cgi?id=1610643&action=edit
Detailed step, volume info, option and log
Description of problem:
Rebalance incomplete when volume option performance.parallel-readdir on
, directory doesn't sync after rebalance cmd status is complete
Version-Release number of selected component (if applicable):
Release:4.1.8
How reproducible:
if a volume is set as one is in the attachment (option list), This bug can be
100% duplicated.
Steps to Reproduce:
1.Create a distribute volume (with 5 Bricks)
2.Set some specific options (options in attachment)
3.make directory and files
ex:
mkdir /mnt/volume_01/dir_1
mkdir /mnt/volume_01/dir_1/dir_2
mkdir /mnt/volume_01/dir_1/dir_2/dir_3
mkdir /mnt/volume_01/dir_1/dir_2/dir_3/dir_4
touch /mnt/volume_01/dir_1/dir_2/file{1..100}
touch /mnt/volume_01/dir_1/dir_2/dir_3/file{101..200}
touch /mnt/volume_01/dir_1/dir_2/dir_3/dir_4/a{201..300}
4.add-brick to volume (add 5 Bricks)
5.Rebalance this volume
6.check Rebalance status:Complete (use gluster v status check)
7.check every bricks's directory and files
Actual results:
Only dir_1 and dir_2 are sync-ed onto the 5 Newly-add Bricks
(doesn't sync dir_3 and dir_4)
Expected results:
Should sync all four directory onto 5 Newly-add Bricks
Additional info:
Detailed step, volume info, option and log is in the attachment
[root at K1 glusterfs]# gluster v get volume_01 all
Option Value
------ -----
cluster.lookup-unhashed on
cluster.lookup-optimize on
cluster.min-free-disk 10%
cluster.min-free-inodes 5%
cluster.rebalance-stats off
cluster.subvols-per-directory (null)
cluster.readdir-optimize off
cluster.rsync-hash-regex (null)
cluster.extra-hash-regex (null)
cluster.dht-xattr-name trusted.glusterfs.dht
cluster.randomize-hash-range-by-gfid off
cluster.rebal-throttle normal
cluster.lock-migration off
cluster.force-migration off
cluster.local-volume-name (null)
cluster.weighted-rebalance on
cluster.switch-pattern (null)
cluster.entry-change-log on
cluster.read-subvolume (null)
cluster.read-subvolume-index -1
cluster.read-hash-mode 1
cluster.background-self-heal-count 8
cluster.metadata-self-heal on
cluster.data-self-heal on
cluster.entry-self-heal on
cluster.self-heal-daemon on
cluster.heal-timeout 600
cluster.self-heal-window-size 1
cluster.data-change-log on
cluster.metadata-change-log on
cluster.data-self-heal-algorithm (null)
cluster.eager-lock on
disperse.eager-lock on
disperse.other-eager-lock on
disperse.eager-lock-timeout 1
disperse.other-eager-lock-timeout 1
cluster.quorum-type none
cluster.quorum-count (null)
cluster.choose-local true
cluster.self-heal-readdir-size 1KB
cluster.post-op-delay-secs 1
cluster.ensure-durability on
cluster.consistent-metadata no
cluster.heal-wait-queue-length 128
cluster.favorite-child-policy none
cluster.full-lock yes
cluster.stripe-block-size 128KB
cluster.stripe-coalesce true
diagnostics.latency-measurement on
diagnostics.dump-fd-stats off
diagnostics.count-fop-hits on
diagnostics.brick-log-level ERROR
diagnostics.client-log-level ERROR
diagnostics.brick-sys-log-level CRITICAL
diagnostics.client-sys-log-level CRITICAL
diagnostics.brick-logger (null)
diagnostics.client-logger (null)
diagnostics.brick-log-format (null)
diagnostics.client-log-format (null)
diagnostics.brick-log-buf-size 5
diagnostics.client-log-buf-size 5
diagnostics.brick-log-flush-timeout 120
diagnostics.client-log-flush-timeout 120
diagnostics.stats-dump-interval 0
diagnostics.fop-sample-interval 0
diagnostics.stats-dump-format json
diagnostics.fop-sample-buf-size 65535
diagnostics.stats-dnscache-ttl-sec 86400
performance.cache-max-file-size 0
performance.cache-min-file-size 0
performance.cache-refresh-timeout 1
performance.cache-priority
performance.cache-size 32MB
performance.io-thread-count 64
performance.high-prio-threads 16
performance.normal-prio-threads 16
performance.low-prio-threads 16
performance.least-prio-threads 1
performance.enable-least-priority on
performance.iot-watchdog-secs (null)
performance.iot-cleanup-disconnected-reqsoff
performance.iot-pass-through false
performance.io-cache-pass-through false
performance.cache-size 128MB
performance.qr-cache-timeout 1
performance.cache-invalidation true
performance.flush-behind on
performance.nfs.flush-behind off
performance.write-behind-window-size 1MB
performance.resync-failed-syncs-after-fsyncoff
performance.nfs.write-behind-window-size1MB
performance.strict-o-direct off
performance.nfs.strict-o-direct off
performance.strict-write-ordering off
performance.nfs.strict-write-ordering off
performance.write-behind-trickling-writeson
performance.aggregate-size 128KB
performance.nfs.write-behind-trickling-writeson
performance.lazy-open yes
performance.read-after-open no
performance.open-behind-pass-through false
performance.read-ahead-page-count 4
performance.read-ahead-pass-through false
performance.readdir-ahead-pass-through false
performance.md-cache-pass-through false
performance.md-cache-timeout 1
performance.cache-swift-metadata true
performance.cache-samba-metadata false
performance.cache-capability-xattrs true
performance.cache-ima-xattrs true
performance.md-cache-statfs off
performance.xattr-cache-list
performance.nl-cache-pass-through false
features.encryption off
encryption.master-key (null)
encryption.data-key-size 256
encryption.block-size 4096
network.frame-timeout 1800
network.ping-timeout 42
network.tcp-window-size (null)
network.remote-dio disable
client.event-threads 8
client.tcp-user-timeout 0
client.keepalive-time 20
client.keepalive-interval 2
client.keepalive-count 9
network.tcp-window-size (null)
network.inode-lru-limit 16384
auth.allow *
auth.reject (null)
transport.keepalive 1
server.allow-insecure on
server.root-squash off
server.anonuid 65534
server.anongid 65534
server.statedump-path /var/run/gluster
server.outstanding-rpc-limit 64
server.ssl (null)
auth.ssl-allow *
server.manage-gids off
server.dynamic-auth on
client.send-gids on
server.gid-timeout 300
server.own-thread (null)
server.event-threads 8
server.tcp-user-timeout 0
server.keepalive-time 20
server.keepalive-interval 2
server.keepalive-count 9
transport.listen-backlog 1024
ssl.own-cert (null)
ssl.private-key (null)
ssl.ca-list (null)
ssl.crl-path (null)
ssl.certificate-depth (null)
ssl.cipher-list (null)
ssl.dh-param (null)
ssl.ec-curve (null)
transport.address-family inet
performance.write-behind off
performance.read-ahead off
performance.readdir-ahead on
performance.io-cache off
performance.quick-read off
performance.open-behind off
performance.nl-cache on
performance.stat-prefetch on
performance.client-io-threads on
performance.nfs.write-behind off
performance.nfs.read-ahead off
performance.nfs.io-cache off
performance.nfs.quick-read off
performance.nfs.stat-prefetch off
performance.nfs.io-threads off
performance.force-readdirp true
performance.cache-invalidation true
features.uss off
features.snapshot-directory .snaps
features.show-snapshot-directory off
features.tag-namespaces off
network.compression off
network.compression.window-size -15
network.compression.mem-level 8
network.compression.min-size 0
network.compression.compression-level -1
network.compression.debug false
features.default-soft-limit 80%
features.soft-timeout 60
features.hard-timeout 5
features.alert-time 86400
features.quota-deem-statfs off
geo-replication.indexing off
geo-replication.indexing off
geo-replication.ignore-pid-check off
geo-replication.ignore-pid-check off
features.quota off
features.inode-quota off
features.bitrot disable
debug.trace off
debug.log-history no
debug.log-file no
debug.exclude-ops (null)
debug.include-ops (null)
debug.error-gen off
debug.error-failure (null)
debug.error-number (null)
debug.random-failure off
debug.error-fops (null)
nfs.enable-ino32 no
nfs.mem-factor 15
nfs.export-dirs on
nfs.export-volumes on
nfs.addr-namelookup off
nfs.dynamic-volumes off
nfs.register-with-portmap on
nfs.outstanding-rpc-limit 16
nfs.port 2049
nfs.rpc-auth-unix on
nfs.rpc-auth-null on
nfs.rpc-auth-allow all
nfs.rpc-auth-reject none
nfs.ports-insecure off
nfs.trusted-sync off
nfs.trusted-write off
nfs.volume-access read-write
nfs.export-dir
nfs.disable off
nfs.nlm on
nfs.acl on
nfs.mount-udp off
nfs.mount-rmtab /var/lib/glusterd/nfs/rmtab
nfs.rpc-statd /sbin/rpc.statd
nfs.server-aux-gids off
nfs.drc off
nfs.drc-size 0x20000
nfs.read-size (1 * 1048576ULL)
nfs.write-size (1 * 1048576ULL)
nfs.readdir-size (1 * 1048576ULL)
nfs.rdirplus on
nfs.event-threads 1
nfs.exports-auth-enable off
nfs.auth-refresh-interval-sec 30
nfs.auth-cache-ttl-sec 30
features.read-only off
features.worm off
features.worm-file-level disable
features.worm-files-deletable on
features.default-retention-period 2147483647
features.retention-mode enterprise
features.auto-commit-period 7200
storage.linux-aio off
storage.batch-fsync-mode reverse-fsync
storage.batch-fsync-delay-usec 0
storage.owner-uid -1
storage.owner-gid -1
storage.node-uuid-pathinfo off
storage.health-check-interval 30
storage.build-pgfid off
storage.gfid2path on
storage.gfid2path-separator :
storage.reserve 1
storage.health-check-timeout 10
storage.fips-mode-rchecksum off
storage.force-create-mode 0000
storage.force-directory-mode 0000
storage.create-mask 0777
storage.create-directory-mask 0777
storage.max-hardlinks 100
storage.ctime off
config.gfproxyd off
cluster.server-quorum-type off
cluster.server-quorum-ratio 0
changelog.changelog off
changelog.changelog-dir {{ brick.path }}/.glusterfs/changelogs
changelog.encoding ascii
changelog.rollover-time 15
changelog.fsync-interval 5
changelog.changelog-barrier-timeout 120
changelog.capture-del-path off
features.barrier disable
features.barrier-timeout 120
features.trash off
features.trash-dir .trashcan
features.trash-eliminate-path (null)
features.trash-max-filesize 5MB
features.trash-internal-op off
cluster.enable-shared-storage disable
locks.trace off
locks.mandatory-locking off
cluster.disperse-self-heal-daemon enable
cluster.quorum-reads no
client.bind-insecure (null)
features.timeout 45
features.failover-hosts (null)
features.shard off
features.shard-block-size 64MB
features.scrub-throttle lazy
features.scrub-freq biweekly
features.scrub false
features.expiry-time 120
features.cache-invalidation on
features.cache-invalidation-timeout 600
features.leases off
features.lease-lock-recall-timeout 60
disperse.background-heals 8
disperse.heal-wait-qlength 128
cluster.heal-timeout 600
dht.force-readdirp on
disperse.read-policy gfid-hash
cluster.shd-max-threads 1
cluster.shd-wait-qlength 1024
cluster.locking-scheme full
cluster.granular-entry-heal no
features.locks-revocation-secs 0
features.locks-revocation-clear-all false
features.locks-revocation-max-blocked 0
features.locks-monkey-unlocking false
features.locks-notify-contention no
features.locks-notify-contention-delay 5
disperse.shd-max-threads 1
disperse.shd-wait-qlength 1024
disperse.cpu-extensions auto
disperse.self-heal-window-size 1
cluster.use-compound-fops off
performance.parallel-readdir on
performance.rda-request-size 131072
performance.rda-low-wmark 4096
performance.rda-high-wmark 128KB
performance.rda-cache-limit 40MB
performance.nl-cache-positive-entry false
performance.nl-cache-limit 10MB
performance.nl-cache-timeout 60
cluster.brick-multiplex off
cluster.max-bricks-per-process 0
disperse.optimistic-change-log on
disperse.stripe-cache 4
cluster.halo-enabled False
cluster.halo-shd-max-latency 99999
cluster.halo-nfsd-max-latency 5
cluster.halo-max-latency 5
cluster.halo-max-replicas 99999
cluster.halo-min-replicas 2
debug.delay-gen off
delay-gen.delay-percentage 10%
delay-gen.delay-duration 100000
delay-gen.enable
disperse.parallel-writes on
features.sdfs off
features.cloudsync off
features.utime off
[root at K1 glusterfs]# gluster v info
Volume Name: volume_01
Type: Distribute
Volume ID: 140b35e4-c095-457f-8f15-0095a10ad83d
Status: Started
Snapshot Count: 0
Number of Bricks: 10
Transport-type: tcp
Bricks:
Brick1: testk1:/mnt/brick01/bk
Brick2: testk1:/mnt/brick02/bk
Brick3: testk1:/mnt/brick03/bk
Brick4: testk1:/mnt/brick04/bk
Brick5: testk1:/mnt/brick05/bk
Brick6: testk1:/mnt/brick06/bk
Brick7: testk1:/mnt/brick07/bk
Brick8: testk1:/mnt/brick08/bk
Brick9: testk1:/mnt/brick09/bk
Brick10: testk1:/mnt/brick10/bk
Options Reconfigured:
performance.rda-cache-limit: 40MB
performance.parallel-readdir: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
features.auto-commit-period: 7200
features.retention-mode: enterprise
features.default-retention-period: 2147483647
features.worm-file-level: disable
nfs.auth-cache-ttl-sec: 30
nfs.auth-refresh-interval-sec: 30
nfs.exports-auth-enable: off
performance.nfs.write-behind: off
performance.nl-cache: on
performance.open-behind: off
performance.quick-read: off
performance.io-cache: off
performance.read-ahead: off
performance.write-behind: off
server.event-threads: 8
client.event-threads: 8
performance.nfs.flush-behind: off
performance.cache-invalidation: true
performance.io-thread-count: 64
diagnostics.client-log-level: ERROR
diagnostics.brick-log-level: ERROR
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
transport.address-family: inet
nfs.disable: off
[root at K1 glusterfs]# gluster v status
Status of volume: volume_01
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick testk1:/mnt/brick01/bk 49152 0 Y 3223
Brick testk1:/mnt/brick02/bk 49153 0 Y 3253
Brick testk1:/mnt/brick03/bk 49154 0 Y 3283
Brick testk1:/mnt/brick04/bk 49155 0 Y 3313
Brick testk1:/mnt/brick05/bk 49156 0 Y 3343
Brick testk1:/mnt/brick06/bk 49157 0 Y 3570
Brick testk1:/mnt/brick07/bk 49158 0 Y 3600
Brick testk1:/mnt/brick08/bk 49159 0 Y 3630
Brick testk1:/mnt/brick09/bk 49160 0 Y 3660
Brick testk1:/mnt/brick10/bk 49161 0 Y 3690
NFS Server on localhost 2049 0 Y 3842
Task Status of Volume volume_01
------------------------------------------------------------------------------
Task : Rebalance
ID : 5afe22d8-9906-4a76-93f3-40b8c699cb34
Status : completed
[root at K1 glusterfs]# gluster --version
glusterfs 4.1.8
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list