[Gluster-users] answer Re: AFR recovery question
freedman at FreeFormIT.com
Fri Oct 10 10:22:10 UTC 2008
no one answered me so I'll just report my findings:
Server1 and ServerB full of data.. Gluster 1.4pre5. FedoraCore9
Disk on serverb crashes, loose everything. re-install
copied over my AFR config from server1 change IP addresses as appropriate.
the OS filesystems are in /gluster/home
mounted gluster filesystem is /home
on server1, I changed the AFR config to only list itself (because I
wasn't sure if this would work and I didn't want afr to go delete
everything instead of copy it over).
so I mount /home on serverB while tailing the gluster log.
the interesting thing, it created the directories in /home and many
produced i/o errors
the log showed many entries such as this:
2008-10-10 04:45:49 C [posix.c:2756:ensure_file_type] home1: entry
/gluster/home/freeform/access-logs is a different type of file than expected
(one for each directory under /home)
so, that access-logs file is a symlink to somewhere else. there
were also entries for any other symlinks in the first level
directory. For any of these links which pointed to directories which
didn't exist, it produced an error and the directory wasn't available.
strange.. but how to fix.. here's what I did.
if I then unmount /home
rm -f /gluster/home/*/access-log
mkdir (link destination for access-log)
there were a couple directories still inaccessible, it turned out
they had bad symlinks as well. did the same thing and now it's plugging along
using the find /home/XXXX -type f -print0 | xargs -0 head -c1 > /dev/null
to auto-heal them and it seems to be going just fine.
once that's finished, I'll re-add serverB to server1's AFR config and
I presume it'll be fine.
I just found it odd that these missing symlink destinations would
cause such a problem.
anyway, it was a minor irritation and overall the auto-healing once
going has been a lifesaver.
At 06:21 AM 10/9/2008, Keith Freedman wrote:
>I have 2 servers that AFR eachother.
>one of them suffered a drive failure and is being rebuilt.
>the question is. What will happen if I just mount the empty drive
>back as the AFR node.
>will it just start grabbing the data from the other server (which is
>exactly what I want), OR
>will it start deleting the data from the other server (which is terribly bad).
>Another thought was to, on the current working server with good data,
>disable the remote afr node from it's config (so it's only using AFR
>on itself), and then leave the other machines config as is, and turn it on.
>This way I can be sure that the node with the data wont go nuts and
>start deleting, but that updates to it will get replicated to the
>this particular set is running 1.4pre5 if that changes the answer.
>Gluster-users mailing list
>Gluster-users at gluster.org
More information about the Gluster-users