[Gluster-users] Fixing heal / split-brain when the entry is a directory
Shawn Heisey
gluster at elyograg.org
Tue Mar 4 23:46:14 UTC 2014
I have a bunch of heal problems on a volume. For this email, I won't
speculate about what caused them - that's a whole other discussion that
I may have at some point in the future. This will concentrate on fixing
the immediate problems so I can move forward.
Thanks to JoeJulian's blog posts and talking to him in the IRC channel,
I have a pretty good handle on how to fix entries in the 'heal $vol
info' output ... but only if the entry given refers to a real *file* or
a gluster link file. Almost all of the entries in my report are
directories, and I have no idea how to fix it.
All I have for these entries is gfid values, so I first locate the entry
in .glusterfs. In this case, it's a symlink.
[root at slc01dfs001a ~]# stat
/bricks/d00v00/mdfs/.glusterfs/fe/93/fe93de6e-5b91-4193-a31c-786726886ff1
File:
`/bricks/d00v00/mdfs/.glusterfs/fe/93/fe93de6e-5b91-4193-a31c-786726886ff1'
-> `../../a7/30/a730505c-84f3-407f-ac27-d45465a17f40/331'
Size: 52 Blocks: 0 IO Block: 4096 symbolic link
Device: fd06h/64774d Inode: 2152112572 Links: 1
Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2013-06-21 03:17:27.740839811 -0600
Modify: 2013-06-21 03:17:27.740839811 -0600
Change: 2013-06-21 03:17:27.740839811 -0600
To figure out what the actual directory name is, I use readlink:
[root at slc01dfs001a ~]# readlink -f
/bricks/d00v00/mdfs/.glusterfs/fe/93/fe93de6e-5b91-4193-a31c-786726886ff1
/bricks/d00v00/mdfs/REDACTED/mdfs/RTR/rtrphotosfour/docs/331
I can get the extended attributes. I know from talking to Joe Julian
that the following output means both copies think the other needs
healing. If I compare 'ls -al' output from the brick directory on both
copies, they are the same.
[root at slc01dfs001a ~]# getfattr -m . -d -e hex
/bricks/d00v00/mdfs/REDACTED/mdfs/RTR/rtrphotosfour/docs/331
getfattr: Removing leading '/' from absolute path names
# file: bricks/d00v00/mdfs/REDACTED/mdfs/RTR/rtrphotosfour/docs/331
trusted.afr.mdfs-client-0=0x00000000000000000000006e
trusted.afr.mdfs-client-1=0x00000000000000000000006e
trusted.gfid=0xfe93de6e5b914193a31c786726886ff1
trusted.glusterfs.dht=0x00000001000000003ffffffc4ffffffa
Now for the big question ... what do I do, in a step-by-step format, to
eliminate this entry from the heal info output? On another entry, I
tried deleting the second trusted.afr entry on both copies, I tried
deleting them both, I tried deleting one and setting the other to zero,
and I tried changing them to both to zero. In between each of these, I
did a stat on the directory via the FUSE mount. It did not change the
heal info output.
Thanks,
Shawn
More information about the Gluster-users
mailing list