[Gluster-users] Recovery (no active sink)
Robin
robinr at miamioh.edu
Sat Jan 5 00:33:14 UTC 2013
Hi,
I have a volume (currently not mounted by any other clients) that
complains about an "unsynced" entry.
Gluster-3.3.1, setup with replicate 2 (at gluster machines p01,p02 to
keep the names short)
# gluster volume heal RedhawkHome info
Gathering Heal info on volume RedhawkHome has been successful
Brick mualglup01:/mnt/gluster/RedhawkHome
Number of entries: 1
<gfid:9ed83644-cae6-4d16-a5b7-7ccb48c41695>
Brick mualglup02:/mnt/gluster/RedhawkHome
Number of entries: 1
<gfid:9ed83644-cae6-4d16-a5b7-7ccb48c41695>
### Usually, the output gives me the path of the file,
### but this time only spits out the gfid
I walked the entire file system and found that the corresponding file
with the gfid is:
./home/zhouq_shared/T2483spectra/January17201/t2483_17jan2010_s1221e1635_1459
I confirmed that the gfid is the same on both p01, p02 Gluster machines
for that file.
### At both p01 and p02, I have exactly matching
# getfattr -d -e hex -m .
home/zhouq_shared/T2483spectra/January172010/t2483_17jan2010_s1221e1635_1459
# file:
home/zhouq_shared/T2483spectra/January172010/t2483_17jan2010_s1221e1635_1459
trusted.afr.RedhawkHome-client-0=0x000000030000000000000000 (non zero)
trusted.afr.RedhawkHome-client-1=0x000000030000000000000000 (non zero)
trusted.gfid=0x9ed83644cae64d16a5b77ccb48c41695
Self heal is failing with the log (repeating many times each time self
heal runs):
[2013-01-04 14:47:05.203072] I
[afr-self-heal-data.c:712:afr_sh_data_fix] 0-RedhawkHome-replicate-0: no
active sinks for performing self-heal on file
<gfid:9ed83644-cae6-4d16-a5b7-7ccb48c41695>
At p01,
# stat
home/zhouq_shared/T2483spectra/January172010/t2483_17jan2010_s1221e1635_1459
File:
`home/zhouq_shared/T2483spectra/January172010/t2483_17jan2010_s1221e1635_1459'
Size: 34881536 Blocks: 68128 IO Block: 4096 regular file
Device: fd02h/64770d Inode: 47190622 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2012-12-18 10:26:35.276898896 -0500
Modify: 2012-12-18 10:26:36.581912761 -0500
Change: 2013-01-04 19:18:34.495935037 -0500
At p02,
# stat
home/zhouq_shared/T2483spectra/January172010/t2483_17jan2010_s1221e1635_1459
File:
`home/zhouq_shared/T2483spectra/January172010/t2483_17jan2010_s1221e1635_1459'
Size: 34881536 Blocks: 68128 IO Block: 4096 regular file
Device: fd02h/64770d Inode: 328602346 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2012-12-18 10:26:35.275947590 -0500
Modify: 2012-12-18 10:26:36.584973268 -0500
Change: 2013-01-04 19:18:34.498314380 -0500
md5sum at both p01,p02 matched exactly.
I don't recall both Gluster machines were down at the same time (but
that does not mean that it did not happen). This is my non-production
volume, it could be me overly aggressive testing things out. But, I
don't recall the client "cp" process which produced that file to have
any error messages (this does not mean that it did not happen too).
What's the best way to recover from this error ?
I assume that the worst case scenario is I use a client to mount the
volume and then delete the file (that is, I lose this file).
Thanks,
Robin
More information about the Gluster-users
mailing list