[Gluster-users] One node goes offline, the other node loses its connection to its local Gluster volume
Joe Julian
joe at julianfamily.org
Thu Mar 6 22:18:13 UTC 2014
On 02/22/2014 05:44 PM, Greg Scott wrote:
>
> I have 2 nodes named fw1 and fw2. When I ifdown the NIC I'm using for
> Gluster on either node, that node cannot see its Gluster volume, but
> the other node can see it after a timeout. As soon as I ifup that
> NIC, everyone can see everything again.
>
> Is this expected behavior? When that interconnect drops, I want both
> nodes to see their own local copy and then sync everything back up
> when the interconnect connects again.
>
If a client loses communication on an open tcp connection to a server,
there is a timeout period (defaults to 42 seconds) where the client
waits for the communication to continue as dropping and re-establishing
hundreds to potentially tens of thousands of file descriptors and locks
is a very expensive process, disruptive to the entire environment.
With the test process you're describing, the clients are connected to
both servers (hopefully based on hostname resolution) ip addresses on
the same network. When you down a nic, that address is no longer
available. Not only can the remote client not connect to it, but your
local client cannot as well as the address no longer exists.
In your real-life concern, the interconnect would not interfere with the
existence of either machines' ip address so after the ping-timeout,
operations would resume in a split-brain configuration. As long as no
changes were made to the same file on both volumes, when the connection
is reestablished, the self-heal will do exactly what you expect.
However.... what you're counting on is the most common cause of
split-brain. Each client connected to one server independently modifies
the same file. When the connection is reestablished, the self-heal is
processed and that file is marked as split-brain - inaccessible from the
client mount until it's resolved by admin intervention.
You can avoid the split-brain using a couple of quorum techniques, the
one that would seem to satisfy your requirements leaving your volume
read-only during the duration of the outage.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140306/41a34ab9/attachment.html>
More information about the Gluster-users
mailing list