[Gluster-users] answer Re: AFR recovery question

Keith Freedman freedman at FreeFormIT.com
Fri Oct 10 10:22:10 UTC 2008


no one answered me so I'll just report my findings:

Server1 and ServerB  full of data..  Gluster 1.4pre5.  FedoraCore9
Disk on serverb crashes, loose everything.  re-install
copied over my AFR config from server1  change IP addresses as appropriate.

the OS filesystems are in /gluster/home
mounted gluster filesystem is /home

on server1, I changed the AFR config to only list itself (because I 
wasn't sure if this would work and I didn't want afr to go delete 
everything instead of copy it over).

so I mount /home on serverB while tailing the gluster log.
the interesting thing, it created the directories in /home and many 
produced i/o errors
the log showed many entries such as this:
2008-10-10 04:45:49 C [posix.c:2756:ensure_file_type] home1: entry 
/gluster/home/freeform/access-logs is a different type of file than expected

(one for each directory under /home)
so, that access-logs file is a symlink to somewhere else.   there 
were also entries for any other symlinks in the first level 
directory.  For any of these links which pointed to directories which 
didn't exist, it produced an error and the directory wasn't available.

strange.. but how to fix.. here's what I did.
if I then unmount /home
rm -f /gluster/home/*/access-log
mkdir (link destination for access-log)
mount /home

there were a couple directories still inaccessible, it turned out 
they had bad symlinks as well.  did the same thing and now it's plugging along
using the find /home/XXXX -type f -print0 | xargs -0 head -c1 > /dev/null
to auto-heal them and it seems to be going just fine.

once that's finished, I'll re-add serverB to server1's AFR config and 
I presume it'll be fine.

I just found it odd that these missing symlink destinations would 
cause such a problem.

anyway, it was a minor irritation and overall the auto-healing once 
going has been a lifesaver.

Keith

At 06:21 AM 10/9/2008, Keith Freedman wrote:
>I have 2 servers that AFR eachother.
>
>one of them suffered a drive failure and is being rebuilt.
>
>the question is.  What will happen if I just mount the empty drive
>back as the AFR node.
>
>will it just start grabbing the data from the other server (which is
>exactly what I want), OR
>will it start deleting the data from the other server (which is terribly bad).
>
>Another thought was to, on the current working server with good data,
>disable the remote afr node from it's config (so it's only using AFR
>on itself), and then leave the other machines config as is, and turn it on.
>This way I can be sure that the node with the data wont go nuts and
>start deleting, but that updates to it will get replicated to the
>other machine.
>
>this particular set is running 1.4pre5 if that changes the answer.
>
>Thanks,
>Keith
>
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users





More information about the Gluster-users mailing list