[Bugs] [Bug 1374567] New: [Bitrot]: Recovery fails of a corrupted hardlink ( and the corresponding parent file) in a disperse volume

Fri Sep 9 05:09:39 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1374567

            Bug ID: 1374567
           Summary: [Bitrot]: Recovery fails of a corrupted hardlink (and
                    the corresponding parent file) in a disperse volume
           Product: GlusterFS
           Version: 3.9
         Component: bitrot
          Severity: medium
          Assignee: bugs at gluster.org
          Reporter: khiremat at redhat.com
                CC: amukherj at redhat.com, aspandey at redhat.com,
                    bmohanra at redhat.com, bugs at gluster.org,
                    khiremat at redhat.com, pkarampu at redhat.com,
                    rcyriac at redhat.com, rhinduja at redhat.com,
                    rhs-bugs at redhat.com, rmekala at redhat.com,
                    sanandpa at redhat.com
        Depends On: 1341934, 1373520, 1374564, 1374565
      Docs Contact: bugs at gluster.org

+++ This bug was initially created as a clone of Bug #1374565 +++

+++ This bug was initially created as a clone of Bug #1374564 +++

+++ This bug was initially created as a clone of Bug #1373520 +++

+++ This bug was initially created as a clone of Bug #1341934 +++

Description of problem:
=======================
Have a 4node cluster with a 1 x (4+2) volume ozone. Enable bitrot and set the
scrubber frequency to hourly. Create files/directories via fuse/nfs and create
a couple of hardlinks as well. Corrupt one of the hardlinks from the backend
brick path and wait for the scrubber to mark it as corrupted. Now follow the
standard procedure of recovering a corrupted file, by deleting the same on the
backend and accessing it from the mountpoint. After recovery, we see that the
recovered file has the same contents as what it had when it was corrupted. 

Version-Release number of selected component (if applicable):
=============================================================

How reproducible:
================
Hit multiple times

Steps to Reproduce:
==================

1. Have a 4node cluster. Create a 4+2 disperse volume on node2, node3 and node4
by using 2 bricks each from every node.
2. Enable bitrot and mount it via fuse. Create 5 files and 2 hardlinks.
3. Go to the brick backend path of node2, and append a line to one of the
hardlinks.
4. Verify using 'cat' that the hardlink as well as the parent file get
corrupted at the backend. 
5. Wait for the scrubber to finish its run, and verify that
/var/log/glusterfs/scrub.log detects the corruption.
6. Delete the hardlink (and the parent file) from the backend brick path of
node2 and access the file from the moutnpoint, hoping that afr will recover the
file on node2.

Actual results:
================
After step6, file and the hardlink do get recovered, but it continues to have
the corrupted data.

Expected results:
=================
Good copy of file should get recovered

Few updates about what happened during the day while trying to debug this
issue.

1. Tried the same steps without bitrot, with a plain disperse volume. If there
is no scrubber involved which marks the file as bad, then the recovery of the
file works as expected at the outset. (However further testing would be
required to confidently claim the same)

2. In the setup that was shared by Kotresh, this behaviour was consistently
reproduced not just for hardlinks/softlinks but even for regular files.

3. Had missed deleting the file entry from .glusterfs folder. Re did the steps
mentioned in the description. THIS time again, the file gets recovered not with
the corrupted data, but with NO data. It is an empty file, which continues to
remain empty. Multiple attempts to manually heal the file using 'gluster volume
heal <volname>' has no effect.

To sum it up, recovery of (corrupted) file is not working as expected in a
disperse volume. Data corruption (and no way to recover) silently leaves the
system in a -1 redundancy state.

EC Team Update:

I was able to reproduce the issue 

1 - Without Bit-rot - Corrupting file from backend and deleting it from path
and also from .glusterfs. Accessing file from mount point successfully heals
the file. No data loss and No data corruption
2 - With Bit- rot - Corrupting file from backend and deleting it from path and
also from .glusterfs. Accessing file from mount point DOES NOT  heal the file.

I tried to debug the [2] and it looks like bit-rot is maintaining the
trusted.bit-rot.bad-file=0x3100 xattr in memory.

Entry heal and metadata heal has been happening successfully. However data heal
is not happening.
When data heal start, shd tries to open this file, from bad as well as good
copy, but this open on bad copy fails. I checked the brick logs and found
following error messages -

[2016-06-02 13:23:13.678342] E [MSGID: 116020]
[bit-rot-stub.c:566:br_stub_check_bad_object] 0-nash-bitrot-stub:
b6cbec17-d66f-42b3-b088-b9c917139bc6 is a bad object. Returning
[2016-06-02 13:23:13.678472] E [MSGID: 115070]
[server-rpc-fops.c:1472:server_open_cbk] 0-nash-server: 2411: OPEN /file-3
(b6cbec17-d66f-42b3-b088-b9c917139bc6) ==> (Input/output error) [Input/output
error]
[2016-06-02 13:23:14.565096] E [MSGID: 116020]
[bit-rot-stub.c:566:br_stub_check_bad_object] 0-nash-bitrot-stub:
24b01cf8-eb2a-4896-ac1d-1bf085bd2623 is a bad object. Returning
[2016-06-02 13:23:14.565308] E [MSGID: 115070]
[server-rpc-fops.c:1472:server_open_cbk] 0-nash-server: 2486: OPEN /file-6
(24b01cf8-eb2a-4896-ac1d-1bf085bd2623) ==> (Input/output error) [Input/output
error]
[2016-06-02 13:23:14.893098] E [MSGID: 116020]
[bit-rot-stub.c:566:br_stub_check_bad_object] 0-nash-bitrot-stub:
65faad93-5bf6-47c5-9b7c-7db281c88882 is a bad object. Returning
[2016-06-02 13:23:14.893202] E [MSGID: 115070]
[server-rpc-fops.c:1472:server_open_cbk] 0-nash-server: 2515: OPEN /file-7
(65faad93-5bf6-47c5-9b7c-7db281c88882) ==> (Input/output error) [Input/output
error]
[2016-06-02 13:23:15.619885] E [MSGID: 116020]
[bit-rot-stub.c:566:br_stub_check_bad_object] 0-nash-bitrot-stub:
b6cbec17-d66f-42b3-b088-b9c917139bc6 is a bad object. Returning

As per the comment on br_stub_check_bad_object function -

/**
 * The possible return values from br_stub_is_bad_object () are:
 * 1) 0  => as per the inode context object is not bad
 * 2) -1 => Failed to get the inode context itself
 * 3) -2 => As per the inode context object is bad
 * Both -ve values means the fop which called this function is failed
 * and error is returned upwards.
 */
In our case it is returning  -2 => As per the inode context object is bad
It seems that even after deletion of files from back end, inode context still
exist in memory which contain trusted.bit-rot.bad-file=0x3100 and returns
error.

I tried to kill the brick process on which file was deleted and restarted the
brick process. Immediately heal happened successfully.

Without restart - 
[root at kotresh-3 nash]# getfattr -d -m . -e hex file-7
# file: file-7
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.bit-rot.signature=0x010300000000000000096c09038638126e90a32e0c4f7322ebf2db4fc213a09407240994596102f832
trusted.bit-rot.version=0x03000000000000005750188300019e6f
trusted.ec.config=0x0000080602000200
trusted.ec.dirty=0x00000000000000020000000000000000
trusted.ec.size=0x000000000000232c
trusted.ec.version=0x00000000000003e900000000000003e9
trusted.gfid=0x65faad935bf647c59b7c7db281c88882

[root at kotresh-4 nash]# getfattr -d -m . -e hex file-7
# file: file-7
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.ec.config=0x0000080602000200
trusted.ec.version=0x000000000000000000000000000003e9
trusted.gfid=0x65faad935bf647c59b7c7db281c88882

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1341934
[Bug 1341934] [Bitrot]: Recovery fails of a corrupted hardlink (and the
corresponding parent file) in a disperse volume
https://bugzilla.redhat.com/show_bug.cgi?id=1373520
[Bug 1373520] [Bitrot]: Recovery fails of a corrupted hardlink (and the
corresponding parent file) in a disperse volume
https://bugzilla.redhat.com/show_bug.cgi?id=1374564
[Bug 1374564] [Bitrot]: Recovery fails of a corrupted hardlink (and the
corresponding parent file) in a disperse volume
https://bugzilla.redhat.com/show_bug.cgi?id=1374565
[Bug 1374565] [Bitrot]: Recovery fails of a corrupted hardlink (and the
corresponding parent file) in a disperse volume
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
You are the Docs Contact for the bug.