[Bugs] [Bug 1462127] New: [Bitrot]: Inconsistency seen with 'scrub ondemand' - fails to trigger scrub
bugzilla at redhat.com
bugzilla at redhat.com
Fri Jun 16 09:22:51 UTC 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1462127
Bug ID: 1462127
Summary: [Bitrot]: Inconsistency seen with 'scrub ondemand' -
fails to trigger scrub
Product: GlusterFS
Version: 3.11
Component: bitrot
Severity: high
Assignee: bugs at gluster.org
Reporter: khiremat at redhat.com
CC: amukherj at redhat.com, bugs at gluster.org,
khiremat at redhat.com, rhs-bugs at redhat.com,
sanandpa at redhat.com, storage-qa-internal at redhat.com
Depends On: 1454596, 1461845
Blocks: 1462080
Docs Contact: bugs at gluster.org
+++ This bug was initially created as a clone of Bug #1461845 +++
+++ This bug was initially created as a clone of Bug #1454596 +++
Description of problem:
=======================
In a 4/6 node cluster for any kind of bitrot-enabled-volume, there have been
times when the command 'gluster volume bitrot <volname> scrub ondemand' was
executed, but that failed to trigger the scrubber process to start scrubbing.
The command 'gluster volume bitrot <volname> scrub status' which should ideally
show the progress of the scrub run per node, continues to display 'Scrubber
pending to complete' for every node, with its overall state 'Active (Idle)' -
proving that the command 'scrub ondemand' turned out to be a no-op. Have hit
this multiple times in automation and once while testing manually. The scrub
logs do show that the scrub ondemand was called, and that is followed with 'No
change in volfile, continuing' logs.
Version-Release number of selected component (if applicable):
============================================================
mainline
How reproducible:
================
Multiple times
Steps to Reproduce:
==================
These might not be sure-shot ways to reproduce it, but these are the general
steps that have been executed whenever this has been hit.
1. Have a bitrot enabled volume with data
2. Disable bitrot. Enable bitrot
3. Trigger scrub ondemand
Additional info:
===================
[2017-05-23 06:10:45.513449] I [MSGID: 118038]
[bit-rot-scrub.c:1085:br_fsscan_ondemand] 0-ozone-bit-rot-0: Ondemand Scrubbing
scheduled to run at 2017-05-23 06:10:46
[2017-05-23 06:10:45.605562] I [glusterfsd-mgmt.c:54:mgmt_cbk_spec] 0-mgmt:
Volume file changed
[2017-05-23 06:10:46.161784] I [glusterfsd-mgmt.c:1780:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2017-05-23 06:10:46.840056] I [MSGID: 118044]
[bit-rot-scrub.c:615:br_scrubber_log_time] 0-ozone-bit-rot-0: Scrubbing started
at 2017-05-23 06:10:46
[2017-05-23 06:10:48.083396] I [glusterfsd-mgmt.c:54:mgmt_cbk_spec] 0-mgmt:
Volume file changed
[2017-05-23 06:10:48.644978] I [glusterfsd-mgmt.c:1780:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[root at dhcp47-164 ~]#
[root at dhcp47-164 ~]# gluster peer status
Number of Peers: 3
Hostname: dhcp47-165.lab.eng.blr.redhat.com
Uuid: 834d66eb-fb65-4ea3-949a-e7cb4c198f2b
State: Peer in Cluster (Connected)
Hostname: dhcp47-162.lab.eng.blr.redhat.com
Uuid: 95491d39-d83a-4053-b1d5-682ca7290bd2
State: Peer in Cluster (Connected)
Hostname: dhcp47-157.lab.eng.blr.redhat.com
Uuid: d0955c85-94d0-41ba-aea8-1ffde3575ea5
State: Peer in Cluster (Connected)
[root at dhcp47-164 ~]#
[root at dhcp47-164 ~]# rpm -qa | grep gluster
glusterfs-geo-replication-3.8.4-25.el7rhgs.x86_64
glusterfs-libs-3.8.4-25.el7rhgs.x86_64
glusterfs-fuse-3.8.4-25.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-4.el7.x86_64
glusterfs-events-3.8.4-25.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-25.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.1.el7rhgs.noarch
glusterfs-rdma-3.8.4-25.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-25.el7rhgs.x86_64
glusterfs-3.8.4-25.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
samba-vfs-glusterfs-4.6.3-0.el7rhgs.x86_64
gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64
glusterfs-cli-3.8.4-25.el7rhgs.x86_64
glusterfs-server-3.8.4-25.el7rhgs.x86_64
python-gluster-3.8.4-25.el7rhgs.noarch
glusterfs-api-3.8.4-25.el7rhgs.x86_64
[root at dhcp47-164 ~]#
[root at dhcp47-164 ~]#
[root at dhcp47-164 ~]#
[root at dhcp47-164 ~]# gluster v list
distrep
ozone
[root at dhcp47-164 ~]# gluster v status
Status of volume: distrep
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.70.47.165:/bricks/brick1/distrep_0 49152 0 Y 7697
Brick 10.70.47.164:/bricks/brick1/distrep_1 49153 0 Y 2021
Brick 10.70.47.162:/bricks/brick1/distrep_2 49153 0 Y 628
Brick 10.70.47.157:/bricks/brick1/distrep_3 49153 0 Y 31735
Self-heal Daemon on localhost N/A N/A Y 2041
Bitrot Daemon on localhost N/A N/A Y 2528
Scrubber Daemon on localhost N/A N/A Y 2538
Self-heal Daemon on dhcp47-165.lab.eng.blr.
redhat.com N/A N/A Y 7785
Bitrot Daemon on dhcp47-165.lab.eng.blr.red
hat.com N/A N/A Y 16837
Scrubber Daemon on dhcp47-165.lab.eng.blr.r
edhat.com N/A N/A Y 16901
Self-heal Daemon on dhcp47-162.lab.eng.blr.
redhat.com N/A N/A Y 648
Bitrot Daemon on dhcp47-162.lab.eng.blr.red
hat.com N/A N/A Y 1350
Scrubber Daemon on dhcp47-162.lab.eng.blr.r
edhat.com N/A N/A Y 1360
Self-heal Daemon on dhcp47-157.lab.eng.blr.
redhat.com N/A N/A Y 31762
Bitrot Daemon on dhcp47-157.lab.eng.blr.red
hat.com N/A N/A Y 32487
Scrubber Daemon on dhcp47-157.lab.eng.blr.r
edhat.com N/A N/A Y 32505
Task Status of Volume distrep
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: ozone
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.70.47.165:/bricks/brick0/ozone_0 49153 0 Y 12918
Brick 10.70.47.164:/bricks/brick0/ozone_1 49152 0 Y 32008
Brick 10.70.47.162:/bricks/brick0/ozone_2 49152 0 Y 31242
Brick 10.70.47.157:/bricks/brick0/ozone_3 49152 0 Y 30037
Self-heal Daemon on localhost N/A N/A Y 2041
Bitrot Daemon on localhost N/A N/A Y 2528
Scrubber Daemon on localhost N/A N/A Y 2538
Self-heal Daemon on dhcp47-162.lab.eng.blr.
redhat.com N/A N/A Y 648
Bitrot Daemon on dhcp47-162.lab.eng.blr.red
hat.com N/A N/A Y 1350
Scrubber Daemon on dhcp47-162.lab.eng.blr.r
edhat.com N/A N/A Y 1360
Self-heal Daemon on dhcp47-165.lab.eng.blr.
redhat.com N/A N/A Y 7785
Bitrot Daemon on dhcp47-165.lab.eng.blr.red
hat.com N/A N/A Y 16837
Scrubber Daemon on dhcp47-165.lab.eng.blr.r
edhat.com N/A N/A Y 16901
Self-heal Daemon on dhcp47-157.lab.eng.blr.
redhat.com N/A N/A Y 31762
Bitrot Daemon on dhcp47-157.lab.eng.blr.red
hat.com N/A N/A Y 32487
Scrubber Daemon on dhcp47-157.lab.eng.blr.r
edhat.com N/A N/A Y 32505
Task Status of Volume ozone
------------------------------------------------------------------------------
There are no active volume tasks
[root at dhcp47-164 ~]#
[root at dhcp47-164 ~]#
[root at dhcp47-164 ~]# gluster v info
Volume Name: distrep
Type: Distributed-Replicate
Volume ID: 71537fad-fa85-4dac-b534-dd6edceba4e9
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.47.165:/bricks/brick1/distrep_0
Brick2: 10.70.47.164:/bricks/brick1/distrep_1
Brick3: 10.70.47.162:/bricks/brick1/distrep_2
Brick4: 10.70.47.157:/bricks/brick1/distrep_3
Options Reconfigured:
features.scrub: Active
features.bitrot: on
transport.address-family: inet
nfs.disable: on
Volume Name: ozone
Type: Distributed-Replicate
Volume ID: aba2693d-b771-4ef5-a0df-d0a2c8f77f9e
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.47.165:/bricks/brick0/ozone_0
Brick2: 10.70.47.164:/bricks/brick0/ozone_1
Brick3: 10.70.47.162:/bricks/brick0/ozone_2
Brick4: 10.70.47.157:/bricks/brick0/ozone_3
Options Reconfigured:
features.scrub-throttle: aggressive
features.scrub-freq: hourly
storage.batch-fsync-delay-usec: 0
nfs.disable: on
transport.address-family: inet
server.allow-insecure: on
performance.cache-samba-metadata: on
performance.nl-cache: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.parallel-readdir: on
features.bitrot: on
features.scrub: Active
[root at dhcp47-164 ~]#
[root at dhcp47-164 ~]#
[root at dhcp47-164 ~]#
--- Additional comment from Worker Ant on 2017-06-15 08:45:36 EDT ---
REVIEW: https://review.gluster.org/17552 (feature/bitrot: Fix ondemand scrub)
posted (#1) for review on master by Kotresh HR (khiremat at redhat.com)
--- Additional comment from Worker Ant on 2017-06-16 02:01:53 EDT ---
COMMIT: https://review.gluster.org/17552 committed in master by Atin Mukherjee
(amukherj at redhat.com)
------
commit f0fb166078d59cab2a33583591b6448326247c40
Author: Kotresh HR <khiremat at redhat.com>
Date: Thu Jun 15 08:31:06 2017 -0400
feature/bitrot: Fix ondemand scrub
The flag which keeps tracks of whether the scrub
frequency is changed from previous value should
not be considered for on-demand scrubbing. It
should be considered only for 'scrub-frequency'
where it should not be re-scheduled if it is
set to same value again. But in case ondemand
scrub, it should start the scrub immediately
no matter what the scrub-frequency.
Reproducer:
1. Enable bitrot
2. Set scrub-throttle
3. Set ondemand scrub
Make sure glusterd is not restarted while doing
below steps
Change-Id: Ice5feaece7fff1579fb009d1a59d2b8292e23e0b
BUG: 1461845
Signed-off-by: Kotresh HR <khiremat at redhat.com>
Reviewed-on: https://review.gluster.org/17552
Smoke: Gluster Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
Reviewed-by: Raghavendra Bhat <raghavendra at redhat.com>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1454596
[Bug 1454596] [Bitrot]: Inconsistency seen with 'scrub ondemand' - fails to
trigger scrub
https://bugzilla.redhat.com/show_bug.cgi?id=1461845
[Bug 1461845] [Bitrot]: Inconsistency seen with 'scrub ondemand' - fails to
trigger scrub
https://bugzilla.redhat.com/show_bug.cgi?id=1462080
[Bug 1462080] [Bitrot]: Inconsistency seen with 'scrub ondemand' - fails to
trigger scrub
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
You are the Docs Contact for the bug.
More information about the Bugs
mailing list