[Gluster-devel] HA translator total failure in 2.0.0rc1
Daniel Maher
dma+gluster at witbe.net
Thu Jan 15 10:18:32 UTC 2009
Hello all,
As noted in this email to the gluster-users list :
http://zresearch.com/pipermail/gluster-users/20090114/001389.html
I've got a simple and reproducible scenario to crash a Gluster client
using the HA translator to access two AFR'd servers. The scenario is
identical to that described by Krishna Srinivas on the gluster-devel
list on 08-01-2008 :
http://lists.gnu.org/archive/html/gluster-devel/2009-01/msg00059.html
Client
|
HA
/ \
/ \
AFR1 AFR2
| |
Server1 Server2
Basically, if i stop glusterfsd on Server1, HA on Client switches to
AFR2 as expected ; however, when i re-enable glusterfsd on Server1, then
stop glusterfsd on Server2, one of two things occurs :
1. Client stops communicating entirely with the cluster (transport
endpoint not connected), or
2. Client recovers and continues communicating with AFR1.
It appears to be random as to which one actually occurs.
If the client recovers and continues to communicate, and i re-enable
glusterfsd on Server2, Client stops communicating immediately with the
cluster - every time, guarunteed.
There are therefore two key questions :
1. In the first component, why doesn't the client switch gracefully
between available subvolumes ?
2. In the second component, why does re-enabling a
previously-unavailable subvolume crash the client ?
All relevant details are in the mail to the gluster-users list, linked
above.
Any ideas what's going on here ?
--
Daniel Maher <dma+gluster AT witbe DOT net>
More information about the Gluster-devel
mailing list