[Bugs] [Bug 1379838] New: gluster missing gfid attribute, healing doesn't work

bugzilla at redhat.com bugzilla at redhat.com
Tue Sep 27 20:08:38 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1379838

            Bug ID: 1379838
           Summary: gluster missing gfid attribute, healing doesn't work
           Product: GlusterFS
           Version: 3.7.15
         Component: selfheal
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: pasik at iki.fi
                CC: bugs at gluster.org



Description of problem:

I have a pretty basic two-node gluster 3.7 setup on CentOS 7, with a volume
replicated/mirrored to both servers.

One of the gluster servers was down for hardware maintenance, and later when it
got back up, the healing process started, re-syncing files.

In the beginning there was some 200 files that had to be synced, and after a
while the number of files got down to 10, but then healing stopped.. it seems
the last 10 files don't seem to get synced no matter what.

So the problem is the healing/re-sync never ends for these files..


Log entries reveal the actual problem:

[2016-09-21 12:41:43.063209] E [MSGID: 113002] [posix.c:252:posix_lookup]
0-gvol1-posix: buf->ia_gfid is null for /bricks/vol1/brick1/foo [No data
available]

[2016-09-21 12:41:43.063266] E [MSGID: 115050]
[server-rpc-fops.c:179:server_lookup_cbk] 0-gvol1-server: 1484202: LOOKUP /foo
 (00000000-0000-0000-0000-000000000001/foo) ==> (No data available) [No data
available]

Manually checking the file in question confirms the problem:

# getfattr -m . -d -e hex /bricks/vol1/brick1/foo
getfattr: Removing leading '/' from absolute path names
# file: bricks/vol1/brick1/foo
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000

no trusted.gfid attribute for the file in question.. I have no clear reason why
this happened, but it could be because I've killed the gluster daemons/services
exactly on the "wrong" moment while preparing the node for maintenance, exactly
when this file in question was being created. But I'm not sure about that..

It seems there was no hardlink either.. nothing in
/bricks/vol1/brick1/.glusterfs/c1/ca/ directory.


Checking on another node:

# getfattr -m . -d -e hex /bricks/vol1/brick1/foo
getfattr: Removing leading '/' from absolute path names
# file: bricks/vol1/brick1/foo
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gvol1-client-1=0x000016620000000100000000
trusted.bit-rot.version=0x020000000000000057e00db5000624ed
trusted.gfid=0xc1ca778ed2af4828b981171c0c5bd45e

So there we have the gfid..

After manually setting the trusted.gfid attribute value on the file, and
launching heal again,  now gluster was able to heal the file OK, and continue
with next files. Healing got fully completed now, and there's no out-of-sync
files anymore.


Pranith Kumar Karampuri on gluster-users mailinglist asked me to create this
bugzilla entry.


Version-Release number of selected component (if applicable):
gluster 3.7.15 from centos7 storage SIG gluster37 repo.


Steps to Reproduce:
1. See above.
2.
3.

Actual results:
healing doesn't finish if there are files without gfid.

Expected results:
Healing continues even if there are files without gfid.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list