[Gluster-devel] AFR self-heal issues.
Sam Douglas
sam.douglas32 at gmail.com
Tue Feb 19 02:51:45 UTC 2008
Hi,
== Background ==
We are setting up GlusterFS on a compute cluster. Each node has two
disk partitions /media/gluster1 and /media/gluster2 which are used for
the cluster storage.
We are currently using builds from TLA (671 as of now)
I have a script to generate GlusterFS client configurations that
create AFR instances over pairs of nodes in the cluster, a snippet
from our current configuration:
# Client definitions
volume client-cn2-1
type protocol/client
option transport-type tcp/client
option remote-host cn2
option remote-subvolume brick1
end-volume
volume client-cn2-2
type protocol/client
option transport-type tcp/client
option remote-host cn2
option remote-subvolume brick2
end-volume
volume client-cn3-1
type protocol/client
option transport-type tcp/client
option remote-host cn3
option remote-subvolume brick1
end-volume
volume client-cn3-2
type protocol/client
option transport-type tcp/client
option remote-host cn3
option remote-subvolume brick2
end-volume
### snip - you get the idea ###
# Generated AFR volumes
volume afr-cn2-cn3
type cluster/afr
subvolumes client-cn2-1 client-cn3-2
end-volume
volume afr-cn3-cn4
type cluster/afr
subvolumes client-cn3-1 client-cn4-2
end-volume
### and so on ###
volume unify
type cluster/unify
option scheduler rr
option namespace namespace
subvolumes afr-cn2-cn3 afr-cn3-cn4 afr-cn4-cn5 ...
end-volume
== Self healing program ==
I wrote a quick C program (medic) that uses the nftw function and
opens all files in a directory tree, and readlinks all symlinks. This
seems effective at forcing AFR to heal.
== Playing with AFR ==
We have a test cluster of 6 nodes set up.
In this setup, cluster node 2 is involved in 'afr-cn2-cn3' and
'afr-cn7-cn2'.
I copy a large directory tree onto the cluster filesystem (such as
/usr), then 'cripple' node cn2 by deleting the data from its backends
and restarting glusterfsd on that system; to emulate the system going
offline/losing data.
(at this point, all the data is still available on the filesystem)
Running medic over the filesystem mount will now cause the data to be
copied back onto cn2's appropriate volumes and all is happy.
Opening all files on the filesystem seems a stupid waste of time if
you know which volumes have gone down (and when you have over 20TB in
hundreds of thousands of files, that is a considerable waste of time),
so I looked into mounting the parts of the client translator tree into
separate mount points and running medic over those.
# mkdir /tmp/glfs
# generate_client_conf > /tmp/glusterfs.vol
# glusterfs -f /tmp/glusterfs.vol -n afr-cn2-cn3 /tmp/glfs
# ls /tmp/glfs
home/
[Should be: home/ usr/]
A `cd /tmp/glfs/usr/` will succeed and usr/ will be self-healed, but
the contents will not. Likewise a `cat /tmp/glfs/usr/include/stdio.h`
will output the contents of the file and cause it to be self-healed.
Changing the order of the subvolumes to the 'afr-cn2-cn3' volume so
that the up to date client is the first volume causes the directory to
be correctly listed.
This seems to me like a minor-ish bug in cluster/afr's readdir
functionality.
-- Sam Douglas
More information about the Gluster-devel
mailing list