[Bugs] [Bug 1357975] New: [Bitrot+Sharding] Scrub status shows incorrect values for 'files scrubbed' and ' files skipped'

Tue Jul 19 17:41:13 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1357975

            Bug ID: 1357975
           Summary: [Bitrot+Sharding] Scrub status shows incorrect values
                    for 'files scrubbed' and 'files skipped'
           Product: GlusterFS
           Version: 3.8.1
         Component: bitrot
          Keywords: ZStream
          Severity: medium
          Assignee: bugs at gluster.org
          Reporter: khiremat at redhat.com
                CC: bugs at gluster.org, khiremat at redhat.com,
                    mzywusko at redhat.com, rhinduja at redhat.com,
                    sanandpa at redhat.com
        Depends On: 1337450, 1356851, 1357973
      Docs Contact: bugs at gluster.org

+++ This bug was initially created as a clone of Bug #1357973 +++

+++ This bug was initially created as a clone of Bug #1356851 +++

+++ This bug was initially created as a clone of Bug #1337450 +++

Description of problem:
========================

In a sharded volume, where every file is split into multiple shards, the
scrubber runs and validates every file (and its shards), but instead of
incrementing once for every file, it does once for every shard. The same gets
reflected in the scrub status output for the fields 'files scrubbed' and 'files
skipped' - which is misleading to the user as the number there is much more
than the total number of files created. 

Version-Release number of selected component (if applicable):
===========================================================

How reproducible:
================= 
Always

Steps to Reproduce:
=====================

1. Have a dist-rep volume, and enable sharding.
2. Create 100 1MB files and validate the scrub status output after its run.
3. Create 5 4G files and wait for the next scrub run.
4. Validate the scrub status output after the scrubber has finished running.

Actual results:
================
'files scrubbed' and 'files skipped' show the number much more than the total
number of files created.

Expected results:
=================
All the fields should be in line with the data actually created.

Additional info:
==================

[root at dhcp35-210 ~]# 
[root at dhcp35-210 ~]# rpm -qa | grep gluster
glusterfs-client-xlators-3.7.9-4.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-libs-3.7.9-4.el7rhgs.x86_64
glusterfs-api-3.7.9-4.el7rhgs.x86_64
gluster-nagios-addons-0.2.7-1.el7rhgs.x86_64
python-gluster-3.7.5-19.el7rhgs.noarch
glusterfs-3.7.9-4.el7rhgs.x86_64
glusterfs-cli-3.7.9-4.el7rhgs.x86_64
glusterfs-server-3.7.9-4.el7rhgs.x86_64
glusterfs-fuse-3.7.9-4.el7rhgs.x86_64
[root at dhcp35-210 ~]# 
[root at dhcp35-210 ~]# 
[root at dhcp35-210 ~]# gluster peer status
Number of Peers: 3

Hostname: 10.70.35.85
Uuid: c9550322-c0ef-45e6-ad20-f38658a5ce54
State: Peer in Cluster (Connected)

Hostname: 10.70.35.137
Uuid: 35426000-dad1-416f-b145-f25049f5036e
State: Peer in Cluster (Connected)

Hostname: 10.70.35.13
Uuid: a756f3da-7896-4970-a77d-4829e603f773
State: Peer in Cluster (Connected)
[root at dhcp35-210 ~]# 
[root at dhcp35-210 ~]# gluster v info

Volume Name: ozone
Type: Distributed-Replicate
Volume ID: d79e220b-acde-4d13-b9d5-f37ec741c117
Status: Started
Number of Bricks: 3 x 3 = 9
Transport-type: tcp
Bricks:
Brick1: 10.70.35.210:/bricks/brick1/ozone
Brick2: 10.70.35.85:/bricks/brick1/ozone
Brick3: 10.70.35.137:/bricks/brick1/ozone
Brick4: 10.70.35.210:/bricks/brick2/ozone
Brick5: 10.70.35.85:/bricks/brick2/ozone
Brick6: 10.70.35.137:/bricks/brick2/ozone
Brick7: 10.70.35.210:/bricks/brick3/ozone
Brick8: 10.70.35.85:/bricks/brick3/ozone
Brick9: 10.70.35.137:/bricks/brick3/ozone
Options Reconfigured:
features.shard: on
features.scrub-throttle: normal
features.scrub-freq: hourly
features.scrub: Active
features.bitrot: on
performance.readdir-ahead: on
[root at dhcp35-210 ~]# 
[root at dhcp35-210 ~]# gluster  v status
Status of volume: ozone
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.210:/bricks/brick1/ozone     49152     0          Y       3255 
Brick 10.70.35.85:/bricks/brick1/ozone      49152     0          Y       15549
Brick 10.70.35.137:/bricks/brick1/ozone     49152     0          Y       32158
Brick 10.70.35.210:/bricks/brick2/ozone     49153     0          Y       3261 
Brick 10.70.35.85:/bricks/brick2/ozone      49153     0          Y       15557
Brick 10.70.35.137:/bricks/brick2/ozone     49153     0          Y       32164
Brick 10.70.35.210:/bricks/brick3/ozone     49154     0          Y       3270 
Brick 10.70.35.85:/bricks/brick3/ozone      49154     0          Y       15564
Brick 10.70.35.137:/bricks/brick3/ozone     49154     0          Y       32171
NFS Server on localhost                     2049      0          Y       24614
Self-heal Daemon on localhost               N/A       N/A        Y       3248 
Bitrot Daemon on localhost                  N/A       N/A        Y       8545 
Scrubber Daemon on localhost                N/A       N/A        Y       8551 
NFS Server on 10.70.35.13                   2049      0          Y       6082 
Self-heal Daemon on 10.70.35.13             N/A       N/A        Y       21680
Bitrot Daemon on 10.70.35.13                N/A       N/A        N       N/A  
Scrubber Daemon on 10.70.35.13              N/A       N/A        N       N/A  
NFS Server on 10.70.35.85                   2049      0          Y       9515 
Self-heal Daemon on 10.70.35.85             N/A       N/A        Y       15542
Bitrot Daemon on 10.70.35.85                N/A       N/A        Y       18642
Scrubber Daemon on 10.70.35.85              N/A       N/A        Y       18648
NFS Server on 10.70.35.137                  2049      0          Y       26213
Self-heal Daemon on 10.70.35.137            N/A       N/A        Y       32153
Bitrot Daemon on 10.70.35.137               N/A       N/A        Y       2919 
Scrubber Daemon on 10.70.35.137             N/A       N/A        Y       2925 

Task Status of Volume ozone
------------------------------------------------------------------------------
There are no active volume tasks

[root at dhcp35-210 ~]# 
[root at dhcp35-210 ~]# gluster v bitrot ozone scrub status

Volume name : ozone

State of scrub: Active

Scrub impact: normal

Scrub frequency: hourly

Bitrot error log location: /var/log/glusterfs/bitd.log

Scrubber error log location: /var/log/glusterfs/scrub.log

=========================================================

Node: localhost

Number of Scrubbed files: 4930

Number of Skipped files: 0

Last completed scrub time: 2016-05-19 07:40:18

Duration of last scrub (D:M:H:M:S): 0:0:30:35

Error count: 1

Corrupted object's [GFID]:

2be8fc38-db5e-464b-b741-616377994cc8

=========================================================

Node: 10.70.35.85

Number of Scrubbed files: 5139

Number of Skipped files: 0

Last completed scrub time: 2016-05-19 08:49:49

Duration of last scrub (D:M:H:M:S): 0:0:29:39

Error count: 1

Corrupted object's [GFID]:

ce5e7a94-cba6-4e65-a7bb-82b1ec396eef

=========================================================

Node: 10.70.35.137

Number of Scrubbed files: 5138

Number of Skipped files: 0

Last completed scrub time: 2016-05-19 09:02:46

Duration of last scrub (D:M:H:M:S): 0:0:31:57

Error count: 0

=========================================================

[root at dhcp35-210 ~]# 

=============
CLIENT LOGS
==============

[root at dhcp35-30 ~]# 
[root at dhcp35-30 ~]# cd /mnt/ozone
[root at dhcp35-30 ozone]# df -k .
Filesystem          1K-blocks     Used Available Use% Mounted on
10.70.35.137:/ozone  62553600 21098496  41455104  34% /mnt/ozone
[root at dhcp35-30 ozone]# 
[root at dhcp35-30 ozone]# 
[root at dhcp35-30 ozone]# ls -a
.  ..  1m_files  4g_files  .trashcan
[root at dhcp35-30 ozone]# 
[root at dhcp35-30 ozone]# 
[root at dhcp35-30 ozone]# ls -l 1m_files/ | wc -l
21
[root at dhcp35-30 ozone]# ls -l 4g_files/ | wc -l
6
[root at dhcp35-30 ozone]#

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1337450
[Bug 1337450] [Bitrot+Sharding] Scrub status shows incorrect values for
'files scrubbed' and 'files skipped'
https://bugzilla.redhat.com/show_bug.cgi?id=1356851
[Bug 1356851] [Bitrot+Sharding] Scrub status shows incorrect values for
'files scrubbed' and 'files skipped'
https://bugzilla.redhat.com/show_bug.cgi?id=1357973
[Bug 1357973] [Bitrot+Sharding] Scrub status shows incorrect values for
'files scrubbed' and 'files skipped'
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
You are the Docs Contact for the bug.