[Gluster-devel] AFR self-heal issues.

Tue Feb 19 03:44:10 UTC 2008

Hi Sam,
A fix is in the works regarding the order of the subvols you mentioned.
Krishna

On Feb 19, 2008 8:21 AM, Sam Douglas <sam.douglas32 at gmail.com> wrote:
> Hi,
>
> == Background ==
>
> We are setting up GlusterFS on a compute cluster. Each node has two
> disk partitions /media/gluster1 and /media/gluster2 which are used for
> the cluster storage.
>
> We are currently using  builds from TLA (671 as of now)
>
> I have a script to generate GlusterFS client configurations that
> create AFR instances over pairs of nodes in the cluster, a snippet
> from our current configuration:
>
> # Client definitions
> volume client-cn2-1
>         type protocol/client
>         option transport-type tcp/client
>         option remote-host cn2
>         option remote-subvolume brick1
> end-volume
>
> volume client-cn2-2
>         type protocol/client
>         option transport-type tcp/client
>         option remote-host cn2
>         option remote-subvolume brick2
> end-volume
>
> volume client-cn3-1
>         type protocol/client
>         option transport-type tcp/client
>         option remote-host cn3
>         option remote-subvolume brick1
> end-volume
>
> volume client-cn3-2
>         type protocol/client
>         option transport-type tcp/client
>         option remote-host cn3
>         option remote-subvolume brick2
> end-volume
>
> ### snip - you get the idea ###
>
> # Generated AFR volumes
> volume afr-cn2-cn3
>         type cluster/afr
>         subvolumes client-cn2-1 client-cn3-2
> end-volume
>
> volume afr-cn3-cn4
>         type cluster/afr
>         subvolumes client-cn3-1 client-cn4-2
> end-volume
>
>
> ### and so on ###
>
> volume unify
>         type cluster/unify
>         option scheduler rr
>         option namespace namespace
>         subvolumes  afr-cn2-cn3 afr-cn3-cn4 afr-cn4-cn5 ...
> end-volume
>
>
> == Self healing program ==
>
> I wrote a quick C program (medic) that uses the nftw function and
> opens all files in a directory tree, and readlinks all symlinks. This
> seems effective at forcing AFR to heal.
>
>
> == Playing with AFR ==
>
> We have a test cluster of 6 nodes set up.
>
> In this setup, cluster node 2 is involved in 'afr-cn2-cn3' and
> 'afr-cn7-cn2'.
>
> I copy a large directory tree onto the cluster filesystem (such as
> /usr), then 'cripple' node cn2 by deleting the data from its backends
> and restarting glusterfsd on that system; to emulate the system going
> offline/losing data.
>
> (at this point, all the data is still available on the filesystem)
>
> Running medic over the filesystem mount will now cause the data to be
> copied back onto cn2's appropriate volumes and all is happy.
>
> Opening all files on the filesystem seems a stupid waste of time if
> you know which volumes have gone down (and when you have over 20TB in
> hundreds of thousands of files, that is a considerable waste of time),
> so I looked into mounting the parts of the client translator tree into
> separate mount points and running medic over those.
>
>  # mkdir /tmp/glfs
>  # generate_client_conf > /tmp/glusterfs.vol
>  # glusterfs -f /tmp/glusterfs.vol -n afr-cn2-cn3 /tmp/glfs
>  # ls /tmp/glfs
>     home/
>     [Should be: home/ usr/]
>
> A `cd /tmp/glfs/usr/` will succeed and usr/ will be self-healed, but
> the contents will not. Likewise a `cat /tmp/glfs/usr/include/stdio.h`
> will output the contents of the file and cause it to be self-healed.
>
> Changing the order of the subvolumes to the 'afr-cn2-cn3' volume so
> that the up to date client is the first volume causes the directory to
> be correctly listed.
>
> This seems to me like a minor-ish bug in cluster/afr's readdir
> functionality.
>
> -- Sam Douglas
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>