[Gluster-devel] two-node HA cluster failover test - failed again :(
Vikas Gorur
vikas at zresearch.com
Wed Apr 9 11:49:49 UTC 2008
Excerpts from Daniel Maher's message of Wed Apr 09 16:40:20 +0530 2008:
>
> Hello all,
>
> After upgrading to 1.3.8pre5, i performed a simple failover test of my
> two-node HA Gluster cluster (wherein one of the nodes is unplugged from
> the network). Unfortunately, the results were - once again - absolutely
> disastrous.
>
> After unplugging one of the two nodes, the cluster became incredibly
> unstable, and the mountpoint on the client bounced between
> non-existant and simply bizarre. This condition remained even after
> plugging the node back onto the network. Restarting glusterfsd on both
> storage nodes did not help at all.
>
> At this point i would be very interested to know if anybody has set up
> a functioning two-node HA cluster using AFR, which can withstand one of
> the nodes temporarily disappearing. Is this something Gluster is
> designed to do, or am i expecting too much ?
This is definitely something GlusterFS is designed to handle. I've set up
this configuration in our lab and am looking into it.
> For those following along, a discussion of the first failover test is
> available from the gluster-devel archives :
> http://lists.gnu.org/archive/html/gluster-devel/2008-04/msg00010.html
>
> The environment is identical as that described by the email linked
> above, so i won't describe it again here. This time, however, i had
> full DEBUG logging enabled. I have made these logs (all 3000+
> lines) available on pastebin :
> dfsC (node that stayed up) : http://pastebin.ca/978162
> dfsD (node that was unplugged) : http://pastebin.ca/978166
Is the order of subvolumes for AFR in your server specfiles the same?
Specifically, on dfsC you should have
subvolumes gfs-dfsD-ds gfs-ds
and on dfsD you should have
subvolumes gfs-ds gfs-dfsC-ds
Is this the case? If not, failover will not work.
Vikas
--
http://vikas.80x25.org/
More information about the Gluster-devel
mailing list