[Bugs] [Bug 1379838] New: gluster missing gfid attribute, healing doesn't work
bugzilla at redhat.com
bugzilla at redhat.com
Tue Sep 27 20:08:38 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1379838
Bug ID: 1379838
Summary: gluster missing gfid attribute, healing doesn't work
Product: GlusterFS
Version: 3.7.15
Component: selfheal
Severity: high
Assignee: bugs at gluster.org
Reporter: pasik at iki.fi
CC: bugs at gluster.org
Description of problem:
I have a pretty basic two-node gluster 3.7 setup on CentOS 7, with a volume
replicated/mirrored to both servers.
One of the gluster servers was down for hardware maintenance, and later when it
got back up, the healing process started, re-syncing files.
In the beginning there was some 200 files that had to be synced, and after a
while the number of files got down to 10, but then healing stopped.. it seems
the last 10 files don't seem to get synced no matter what.
So the problem is the healing/re-sync never ends for these files..
Log entries reveal the actual problem:
[2016-09-21 12:41:43.063209] E [MSGID: 113002] [posix.c:252:posix_lookup]
0-gvol1-posix: buf->ia_gfid is null for /bricks/vol1/brick1/foo [No data
available]
[2016-09-21 12:41:43.063266] E [MSGID: 115050]
[server-rpc-fops.c:179:server_lookup_cbk] 0-gvol1-server: 1484202: LOOKUP /foo
(00000000-0000-0000-0000-000000000001/foo) ==> (No data available) [No data
available]
Manually checking the file in question confirms the problem:
# getfattr -m . -d -e hex /bricks/vol1/brick1/foo
getfattr: Removing leading '/' from absolute path names
# file: bricks/vol1/brick1/foo
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000
no trusted.gfid attribute for the file in question.. I have no clear reason why
this happened, but it could be because I've killed the gluster daemons/services
exactly on the "wrong" moment while preparing the node for maintenance, exactly
when this file in question was being created. But I'm not sure about that..
It seems there was no hardlink either.. nothing in
/bricks/vol1/brick1/.glusterfs/c1/ca/ directory.
Checking on another node:
# getfattr -m . -d -e hex /bricks/vol1/brick1/foo
getfattr: Removing leading '/' from absolute path names
# file: bricks/vol1/brick1/foo
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gvol1-client-1=0x000016620000000100000000
trusted.bit-rot.version=0x020000000000000057e00db5000624ed
trusted.gfid=0xc1ca778ed2af4828b981171c0c5bd45e
So there we have the gfid..
After manually setting the trusted.gfid attribute value on the file, and
launching heal again, now gluster was able to heal the file OK, and continue
with next files. Healing got fully completed now, and there's no out-of-sync
files anymore.
Pranith Kumar Karampuri on gluster-users mailinglist asked me to create this
bugzilla entry.
Version-Release number of selected component (if applicable):
gluster 3.7.15 from centos7 storage SIG gluster37 repo.
Steps to Reproduce:
1. See above.
2.
3.
Actual results:
healing doesn't finish if there are files without gfid.
Expected results:
Healing continues even if there are files without gfid.
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list