[Bugs] [Bug 1462080] New: [Bitrot]: Inconsistency seen with 'scrub ondemand' - fails to trigger scrub

bugzilla at redhat.com bugzilla at redhat.com
Fri Jun 16 06:53:10 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1462080

            Bug ID: 1462080
           Summary: [Bitrot]: Inconsistency seen with 'scrub ondemand' -
                    fails to trigger scrub
           Product: GlusterFS
           Version: 3.10
         Component: bitrot
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: khiremat at redhat.com
                CC: amukherj at redhat.com, bugs at gluster.org,
                    khiremat at redhat.com, rhs-bugs at redhat.com,
                    sanandpa at redhat.com, storage-qa-internal at redhat.com
        Depends On: 1454596, 1461845
      Docs Contact: bugs at gluster.org



+++ This bug was initially created as a clone of Bug #1461845 +++

+++ This bug was initially created as a clone of Bug #1454596 +++

Description of problem:
=======================
In a 4/6 node cluster for any kind of bitrot-enabled-volume, there have been
times when the command 'gluster volume bitrot <volname> scrub ondemand' was
executed, but that failed to trigger the scrubber process to start scrubbing.
The command 'gluster volume bitrot <volname> scrub status' which should ideally
show the progress of the scrub run per node, continues to display 'Scrubber
pending to complete' for every node, with its overall state 'Active (Idle)' -
proving that the command 'scrub ondemand' turned out to be a no-op. Have hit
this multiple times in automation and once while testing manually. The scrub
logs do show that the scrub ondemand was called, and that is followed with 'No
change in volfile, continuing' logs. 

Version-Release number of selected component (if applicable):
============================================================
mainline


How reproducible:
================
Multiple times


Steps to Reproduce:
==================
These might not be sure-shot ways to reproduce it, but these are the general
steps that have been executed whenever this has been hit.
1. Have a bitrot enabled volume with data
2. Disable bitrot. Enable bitrot
3. Trigger scrub ondemand


Additional info:
===================

[2017-05-23 06:10:45.513449] I [MSGID: 118038]
[bit-rot-scrub.c:1085:br_fsscan_ondemand] 0-ozone-bit-rot-0: Ondemand Scrubbing
scheduled to run at 2017-05-23 06:10:46
[2017-05-23 06:10:45.605562] I [glusterfsd-mgmt.c:54:mgmt_cbk_spec] 0-mgmt:
Volume file changed
[2017-05-23 06:10:46.161784] I [glusterfsd-mgmt.c:1780:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2017-05-23 06:10:46.840056] I [MSGID: 118044]
[bit-rot-scrub.c:615:br_scrubber_log_time] 0-ozone-bit-rot-0: Scrubbing started
at 2017-05-23 06:10:46
[2017-05-23 06:10:48.083396] I [glusterfsd-mgmt.c:54:mgmt_cbk_spec] 0-mgmt:
Volume file changed
[2017-05-23 06:10:48.644978] I [glusterfsd-mgmt.c:1780:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing

[root at dhcp47-164 ~]# 
[root at dhcp47-164 ~]# gluster peer status
Number of Peers: 3

Hostname: dhcp47-165.lab.eng.blr.redhat.com
Uuid: 834d66eb-fb65-4ea3-949a-e7cb4c198f2b
State: Peer in Cluster (Connected)

Hostname: dhcp47-162.lab.eng.blr.redhat.com
Uuid: 95491d39-d83a-4053-b1d5-682ca7290bd2
State: Peer in Cluster (Connected)

Hostname: dhcp47-157.lab.eng.blr.redhat.com
Uuid: d0955c85-94d0-41ba-aea8-1ffde3575ea5
State: Peer in Cluster (Connected)
[root at dhcp47-164 ~]# 
[root at dhcp47-164 ~]# rpm -qa | grep gluster
glusterfs-geo-replication-3.8.4-25.el7rhgs.x86_64
glusterfs-libs-3.8.4-25.el7rhgs.x86_64
glusterfs-fuse-3.8.4-25.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-4.el7.x86_64
glusterfs-events-3.8.4-25.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-25.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.1.el7rhgs.noarch
glusterfs-rdma-3.8.4-25.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-25.el7rhgs.x86_64
glusterfs-3.8.4-25.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
samba-vfs-glusterfs-4.6.3-0.el7rhgs.x86_64
gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64
glusterfs-cli-3.8.4-25.el7rhgs.x86_64
glusterfs-server-3.8.4-25.el7rhgs.x86_64
python-gluster-3.8.4-25.el7rhgs.noarch
glusterfs-api-3.8.4-25.el7rhgs.x86_64
[root at dhcp47-164 ~]# 
[root at dhcp47-164 ~]# 
[root at dhcp47-164 ~]# 
[root at dhcp47-164 ~]# gluster v list
distrep
ozone
[root at dhcp47-164 ~]# gluster v status
Status of volume: distrep
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.165:/bricks/brick1/distrep_0 49152     0          Y       7697 
Brick 10.70.47.164:/bricks/brick1/distrep_1 49153     0          Y       2021 
Brick 10.70.47.162:/bricks/brick1/distrep_2 49153     0          Y       628  
Brick 10.70.47.157:/bricks/brick1/distrep_3 49153     0          Y       31735
Self-heal Daemon on localhost               N/A       N/A        Y       2041 
Bitrot Daemon on localhost                  N/A       N/A        Y       2528 
Scrubber Daemon on localhost                N/A       N/A        Y       2538 
Self-heal Daemon on dhcp47-165.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       7785 
Bitrot Daemon on dhcp47-165.lab.eng.blr.red
hat.com                                     N/A       N/A        Y       16837
Scrubber Daemon on dhcp47-165.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       16901
Self-heal Daemon on dhcp47-162.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       648  
Bitrot Daemon on dhcp47-162.lab.eng.blr.red
hat.com                                     N/A       N/A        Y       1350 
Scrubber Daemon on dhcp47-162.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       1360 
Self-heal Daemon on dhcp47-157.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       31762
Bitrot Daemon on dhcp47-157.lab.eng.blr.red
hat.com                                     N/A       N/A        Y       32487
Scrubber Daemon on dhcp47-157.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       32505

Task Status of Volume distrep
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: ozone
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.165:/bricks/brick0/ozone_0   49153     0          Y       12918
Brick 10.70.47.164:/bricks/brick0/ozone_1   49152     0          Y       32008
Brick 10.70.47.162:/bricks/brick0/ozone_2   49152     0          Y       31242
Brick 10.70.47.157:/bricks/brick0/ozone_3   49152     0          Y       30037
Self-heal Daemon on localhost               N/A       N/A        Y       2041 
Bitrot Daemon on localhost                  N/A       N/A        Y       2528 
Scrubber Daemon on localhost                N/A       N/A        Y       2538 
Self-heal Daemon on dhcp47-162.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       648  
Bitrot Daemon on dhcp47-162.lab.eng.blr.red
hat.com                                     N/A       N/A        Y       1350 
Scrubber Daemon on dhcp47-162.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       1360 
Self-heal Daemon on dhcp47-165.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       7785 
Bitrot Daemon on dhcp47-165.lab.eng.blr.red
hat.com                                     N/A       N/A        Y       16837
Scrubber Daemon on dhcp47-165.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       16901
Self-heal Daemon on dhcp47-157.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       31762
Bitrot Daemon on dhcp47-157.lab.eng.blr.red
hat.com                                     N/A       N/A        Y       32487
Scrubber Daemon on dhcp47-157.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       32505

Task Status of Volume ozone
------------------------------------------------------------------------------
There are no active volume tasks

[root at dhcp47-164 ~]# 
[root at dhcp47-164 ~]# 
[root at dhcp47-164 ~]# gluster v info

Volume Name: distrep
Type: Distributed-Replicate
Volume ID: 71537fad-fa85-4dac-b534-dd6edceba4e9
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.47.165:/bricks/brick1/distrep_0
Brick2: 10.70.47.164:/bricks/brick1/distrep_1
Brick3: 10.70.47.162:/bricks/brick1/distrep_2
Brick4: 10.70.47.157:/bricks/brick1/distrep_3
Options Reconfigured:
features.scrub: Active
features.bitrot: on
transport.address-family: inet
nfs.disable: on

Volume Name: ozone
Type: Distributed-Replicate
Volume ID: aba2693d-b771-4ef5-a0df-d0a2c8f77f9e
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.47.165:/bricks/brick0/ozone_0
Brick2: 10.70.47.164:/bricks/brick0/ozone_1
Brick3: 10.70.47.162:/bricks/brick0/ozone_2
Brick4: 10.70.47.157:/bricks/brick0/ozone_3
Options Reconfigured:
features.scrub-throttle: aggressive
features.scrub-freq: hourly
storage.batch-fsync-delay-usec: 0
nfs.disable: on
transport.address-family: inet
server.allow-insecure: on
performance.cache-samba-metadata: on
performance.nl-cache: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.parallel-readdir: on
features.bitrot: on
features.scrub: Active
[root at dhcp47-164 ~]# 
[root at dhcp47-164 ~]# 
[root at dhcp47-164 ~]#

--- Additional comment from Worker Ant on 2017-06-15 08:45:36 EDT ---

REVIEW: https://review.gluster.org/17552 (feature/bitrot: Fix ondemand scrub)
posted (#1) for review on master by Kotresh HR (khiremat at redhat.com)

--- Additional comment from Worker Ant on 2017-06-16 02:01:53 EDT ---

COMMIT: https://review.gluster.org/17552 committed in master by Atin Mukherjee
(amukherj at redhat.com) 
------
commit f0fb166078d59cab2a33583591b6448326247c40
Author: Kotresh HR <khiremat at redhat.com>
Date:   Thu Jun 15 08:31:06 2017 -0400

    feature/bitrot: Fix ondemand scrub

    The flag which keeps tracks of whether the scrub
    frequency is changed from previous value should
    not be considered for on-demand scrubbing. It
    should be considered only for 'scrub-frequency'
    where it should not be re-scheduled if it is
    set to same value again. But in case ondemand
    scrub, it should start the scrub immediately
    no matter what the scrub-frequency.

    Reproducer:
    1. Enable bitrot
    2. Set scrub-throttle
    3. Set ondemand scrub
    Make sure glusterd is not restarted while doing
    below steps

    Change-Id: Ice5feaece7fff1579fb009d1a59d2b8292e23e0b
    BUG: 1461845
    Signed-off-by: Kotresh HR <khiremat at redhat.com>
    Reviewed-on: https://review.gluster.org/17552
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Raghavendra Bhat <raghavendra at redhat.com>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1454596
[Bug 1454596] [Bitrot]: Inconsistency seen with 'scrub ondemand' - fails to
trigger scrub
https://bugzilla.redhat.com/show_bug.cgi?id=1461845
[Bug 1461845] [Bitrot]: Inconsistency seen with 'scrub ondemand' - fails to
trigger scrub
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
You are the Docs Contact for the bug.


More information about the Bugs mailing list