[Gluster-users] Fixing heal / split-brain when the entry is a directory

Shawn Heisey gluster at elyograg.org
Tue Mar 4 23:46:14 UTC 2014


I have a bunch of heal problems on a volume.  For this email, I won't 
speculate about what caused them - that's a whole other discussion that 
I may have at some point in the future.  This will concentrate on fixing 
the immediate problems so I can move forward.

Thanks to JoeJulian's blog posts and talking to him in the IRC channel, 
I have a pretty good handle on how to fix entries in the 'heal $vol 
info' output ... but only if the entry given refers to a real *file* or 
a gluster link file.  Almost all of the entries in my report are 
directories, and I have no idea how to fix it.

All I have for these entries is gfid values, so I first locate the entry 
in .glusterfs.  In this case, it's a symlink.

[root at slc01dfs001a ~]# stat 
/bricks/d00v00/mdfs/.glusterfs/fe/93/fe93de6e-5b91-4193-a31c-786726886ff1
   File: 
`/bricks/d00v00/mdfs/.glusterfs/fe/93/fe93de6e-5b91-4193-a31c-786726886ff1' 
-> `../../a7/30/a730505c-84f3-407f-ac27-d45465a17f40/331'
   Size: 52              Blocks: 0          IO Block: 4096   symbolic link
Device: fd06h/64774d    Inode: 2152112572  Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-06-21 03:17:27.740839811 -0600
Modify: 2013-06-21 03:17:27.740839811 -0600
Change: 2013-06-21 03:17:27.740839811 -0600

To figure out what the actual directory name is, I use readlink:

[root at slc01dfs001a ~]# readlink -f 
/bricks/d00v00/mdfs/.glusterfs/fe/93/fe93de6e-5b91-4193-a31c-786726886ff1
/bricks/d00v00/mdfs/REDACTED/mdfs/RTR/rtrphotosfour/docs/331

I can get the extended attributes. I know from talking to Joe Julian 
that the following output means both copies think the other needs 
healing.  If I compare 'ls -al' output from the brick directory on both 
copies, they are the same.

[root at slc01dfs001a ~]# getfattr -m . -d -e hex 
/bricks/d00v00/mdfs/REDACTED/mdfs/RTR/rtrphotosfour/docs/331
getfattr: Removing leading '/' from absolute path names
# file: bricks/d00v00/mdfs/REDACTED/mdfs/RTR/rtrphotosfour/docs/331
trusted.afr.mdfs-client-0=0x00000000000000000000006e
trusted.afr.mdfs-client-1=0x00000000000000000000006e
trusted.gfid=0xfe93de6e5b914193a31c786726886ff1
trusted.glusterfs.dht=0x00000001000000003ffffffc4ffffffa

Now for the big question ... what do I do, in a step-by-step format, to 
eliminate this entry from the heal info output?  On another entry, I 
tried deleting the second trusted.afr entry on both copies, I tried 
deleting them both, I tried deleting one and setting the other to zero, 
and I tried changing them to both to zero.  In between each of these, I 
did a stat on the directory via the FUSE mount.  It did not change the 
heal info output.

Thanks,
Shawn



More information about the Gluster-users mailing list