[Gluster-users] Recovering badly corrupted directory

Wed Aug 26 06:25:15 UTC 2015

Hi Everybody,

I'm trying to recover a badly corrupted directory on our gluster, and
need some advice:

It looks like we've hit this bug here, which was reported against
gluster 2.1 and is unresolved:
https://bugzilla.redhat.com/show_bug.cgi?id=1034148
Bug 1034148 - "DHT : on lookup getting error ' cannot read symbolic link
<dir1>: Invalid argument' or 'Input/output error' and logs says
"[posix.c:737:posix_readlink] 0-flat-posix: readlink on <dir> failed
Invalid argument" + it shows directory twice in output"

Some googling shows  what looks like the same bug on glusterfs 3.3 here:
http://www.gluster.org/pipermail/gluster-users/2013-August/014038.html
and glusterfs 3.4.2-1 here:
http://www.gluster.org/pipermail/gluster-users/2014-February/016271.html

We are running gluster 3.4.1-3.el6.x86_64 on centos 6.4

Our data for the corrupted folder appears to exist on the bricks  but is
unusable via the gluster volume.
There are many files with ------T permissions, many of which have zero
size, others have data

Here is what I get when I list the original problem directory:

ls -la
/gluster/vol00/archive/Online_Archive/Survey/Riegl/2014/Saudi/H11_Riegl_
RiAcquire_Raw_Data/14_11_SAS18_1_RiAcquire\ \(FieldRawData\)/

ls: cannot read symbolic link
/gluster/vol00/archive/Online_Archive/Survey/Riegl/2014/Saudi/H11_Riegl_
RiAcquire_Raw_Data/14_11_SAS18_1_RiAcquire
(FieldRawData)/14_11_140823_S004_DNP: Invalid argument

ls: cannot read symbolic link
/gluster/vol00/archive/Online_Archive/Survey/Riegl/2014/Saudi/H11_Riegl_
RiAcquire_Raw_Data/14_11_SAS18_1_RiAcquire
(FieldRawData)/14_11_140902_S021: Invalid argument

<snips many more like this>
total 0

lrwxrwxrwx 0 root    root     70 Jul 13 16:30 14_11_140819_S001_DNP ->
../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140819_S001_DNP

lrwxrwxrwx 0 root    root     70 Jul 13 16:30 14_11_140819_S001_DNP ->
../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140819_S001_DNP

lrwxrwxrwx 0 root    root     70 Jul 13 16:30 14_11_140819_S001_DNP ->
../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140819_S001_DNP

lrwxrwxrwx 0 root    root     70 Jul 13 16:30 14_11_140819_S001_DNP ->
../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140819_S001_DNP

lrwxrwxrwx 0 root    root     70 Jul 13 16:30 14_11_140819_S001_DNP ->
../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140819_S001_DNP

lrwxrwxrwx 0 root    root     66 Jul 13 16:31 14_11_140821_S002 ->
../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S002

lrwxrwxrwx 0 root    root     66 Jul 13 16:31 14_11_140821_S002 ->
../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S002

lrwxrwxrwx 0 root    root     66 Jul 13 16:31 14_11_140821_S002 ->
../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S002

lrwxrwxrwx 0 root    root     66 Jul 13 16:31 14_11_140821_S002 ->
../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S002

lrwxrwxrwx 0 root    root     66 Jul 13 16:31 14_11_140821_S002 ->
../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S002

lrwxrwxrwx 0 root    root     70 Jul 13 16:31 14_11_140821_S003_DNP ->
../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S003_DNP

lrwxrwxrwx 0 root    root     70 Jul 13 16:31 14_11_140821_S003_DNP ->
../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S003_DNP

lrwxrwxrwx 0 root    root     70 Jul 13 16:31 14_11_140821_S003_DNP ->
../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S003_DNP

lrwxrwxrwx 0 root    root     70 Jul 13 16:31 14_11_140821_S003_DNP ->
../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S003_DNP

lrwxrwxrwx 0 root    root     70 Jul 13 16:31 14_11_140821_S003_DNP ->
../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S003_DNP

lrwxrwxrwx 0 root    root     70 Jul 13 16:32 14_11_140823_S004_DNP

lrwxrwxrwx 0 root    root     70 Jul 13 16:32 14_11_140823_S004_DNP

lrwxrwxrwx 1 root    root     70 Jul 13 16:32 14_11_140823_S004_DNP

lrwxrwxrwx 0 root    root     70 Jul 13 16:32 14_11_140823_S004_DNP

lrwxrwxrwx 0 root    root     70 Jul 13 16:32 14_11_140823_S004_DNP

lrwxrwxrwx 0 root    root     66 Jul 13 14:38 14_11_140823_S005 ->
../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140823_S005

lrwxrwxrwx 0 root    root     66 Jul 13 14:38 14_11_140823_S005 ->
../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140823_S005

lrwxrwxrwx 0 root    root     66 Jul 13 14:38 14_11_140823_S005 ->
../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140823_S005

<snips many more like this>

As you can see, the listing is a catastrophic mess, with a multitude of
duplicate entries, symbolic links which give this "invalid argument"
nonsense, and all the symlinks to ../../a9/65/blahblah are broken.

I think I've managed to restore access to most of the data by rsyncing
from its original location on each brick, to a new location on the brick
((not copying xattributes), and then recursively listing the new copy
via the gluster volume to pull the new data into the gluster volume. 

The copied data we've brought back into the gluster volume looks like it
is all or mostly there, but there are still quite a few duplicates of
many files with the ------T permissions in the new copy, so trying to
rsync from this new copy throws errors saying structure needs to be
cleaned and items failed verification, not surprising as rsync would
have no way of properly coping with these duplicates.

Eg here is a small subfolder:
ll "/gluster/vol00/gluster-recovery-2/14_11_SAS18_1_RiAcquire
(FieldRawData)/14_11_141023_S091_DNP/08_RECEIVED/"
total 3165
-rwxr-xr-x 1 survey1 surveyor 2702101 Oct 23  2014 14_11_141023_S091.rhk
---------T 1 survey1 surveyor 2702101 May 27 04:49 14_11_141023_S091.rhk
-rwxr-xr-x 1 survey1 surveyor  407859 Oct 23  2014 14_11_141023_S091.rpc
---------T 1 survey1 surveyor  407859 May 27 04:49 14_11_141023_S091.rpc
-rwxr-xr-x 1 survey1 surveyor   28653 Oct 23  2014 14_11_141023_S091.rpl
-rwxr-xr-x 1 survey1 surveyor  101609 Oct 23  2014 14_11_141023_S091.rpp

Note the duplicates with ------T perms.

The question s I have are:

1) Is there a better way to clean up this corruption? 

2) What are these ------T files and why are they there?!?!?!?!

3) what has caused the initial problem which corrupted our data? 
	I was on leave when this problem was noticed, but my colleagues
are pretty sure the directory looked OK after our last rebalance, so we
do not think this problem occurred during rebalance.
	We have 3 nodes in our cluster, and one of them is having issues
with occasional spontaneous reboots, so it does drop off the gluster at
times and then returns. But I don't think we have modified or moved any
of the corrupted data recently, so I do not think the problem has been
caused by data being moved during rebalance or by the node rebooting
while data was being manually moved from one place to another.

4) Can I just delete the remaining ------T files from the data we copied
and readded into the volume?
	If I do so, is there any chance that the other duplicate is bad,
and the T-file is the good copy that I should've saved?
	What are the chances this cleanup has fixed everything? Is there
still likely to be corrupt and/or missing files?

5) If we can get our copy of this data cleaned up, we would like to
delete the original corrupted folder from the volume, by going behind
gluster and deleting the data off the bricks.
	What is the correct procedure for doing this? 
	Ie if we delete the bad data off all the bricks, this will leave
files or links in the .glusterfs folder won't it? 
	How do I find the correct files under .glusterfs to delete? Or
do I just delete from bricks and then have to wait until the next time
we do a rebalance, and let the rebalance clean up the mess?

Any advice would be appreciated!

Thanks muchly,
John