[Gluster-devel] two-node HA cluster failover test - failed again :(
Daniel Maher
dma+gluster at witbe.net
Wed Apr 9 11:10:20 UTC 2008
Hello all,
After upgrading to 1.3.8pre5, i performed a simple failover test of my
two-node HA Gluster cluster (wherein one of the nodes is unplugged from
the network). Unfortunately, the results were - once again - absolutely
disastrous.
After unplugging one of the two nodes, the cluster became incredibly
unstable, and the mountpoint on the client bounced between
non-existant and simply bizarre. This condition remained even after
plugging the node back onto the network. Restarting glusterfsd on both
storage nodes did not help at all.
At this point i would be very interested to know if anybody has set up
a functioning two-node HA cluster using AFR, which can withstand one of
the nodes temporarily disappearing. Is this something Gluster is
designed to do, or am i expecting too much ?
For those following along, a discussion of the first failover test is
available from the gluster-devel archives :
http://lists.gnu.org/archive/html/gluster-devel/2008-04/msg00010.html
The environment is identical as that described by the email linked
above, so i won't describe it again here. This time, however, i had
full DEBUG logging enabled. I have made these logs (all 3000+
lines) available on pastebin :
dfsC (node that stayed up) : http://pastebin.ca/978162
dfsD (node that was unplugged) : http://pastebin.ca/978166
As well, i've provided a cut/paste over my user session from the client
perspective (dfsA). Note that the only thing i did was try to "ls" the
mountpoint. I also ran "date" a handful of times to provide a point of
reference :
http://pastebin.ca/978149
--
Daniel Maher <dma AT witbe.net>
More information about the Gluster-devel
mailing list