[Gluster-users] One node goes offline, the other node loses its connection to its local Gluster volume
Ted Miller
tmiller at hcjb.org
Tue Mar 11 18:01:05 UTC 2014
On 3/6/2014 7:48 PM, Greg Scott wrote:
>> In your real-life concern, the interconnect would not interfere with the existence of either
>> machines' ip address so after the ping-timeout, operations would resume in a split-brain
>> configuration. As long as no changes were made to the same file on both volumes, when the
>> connection is reestablished, the self-heal will do exactly what you expect.
> Except that's not what happens. If I ifdown that interconnect NIC, I should see the file system after 42 seconds, right? But I don't. Email butchers the output below, but what it shows is, I can look at my /firewall-scripts directory just fine when things are steady state. I ifdown the interconnect NIC, that directory goes away. I wait more than 2 minutes and it still doesn't come back. And then when I ifup the NIC, everything goes back to normal after a few seconds.
>
> [root at stylmark-fw1 ~]# ls /firewall-scripts
> allow-all etc initial_rc.firewall rcfirewall.conf var
> allow-all-with-nat failover-monitor.sh rc.firewall start-failover-monitor.sh
> [root at stylmark-fw1 ~]# date
> Thu Mar 6 18:39:42 CST 2014
> [root at stylmark-fw1 ~]# ifdown enp5s4
> [root at stylmark-fw1 ~]# ls /firewall-scripts
> ls: cannot access /firewall-scripts: Transport endpoint is not connected
> [root at stylmark-fw1 ~]# date
> Thu Mar 6 18:41:50 CST 2014
> [root at stylmark-fw1 ~]# ls /firewall-scripts
> ls: cannot access /firewall-scripts: No such file or directory
> [root at stylmark-fw1 ~]# ifup enp5s4
> [root at stylmark-fw1 ~]# ls /firewall-scripts
> ls: cannot access /firewall-scripts: No such file or directory
> [root at stylmark-fw1 ~]# df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/mapper/fedora-root 17G 2.3G 14G 14% /
> devtmpfs 989M 0 989M 0% /dev
> tmpfs 996M 0 996M 0% /dev/shm
> tmpfs 996M 524K 996M 1% /run
> tmpfs 996M 0 996M 0% /sys/fs/cgroup
> tmpfs 996M 0 996M 0% /tmp
> /dev/sda2 477M 87M 362M 20% /boot
> /dev/sda1 200M 9.6M 191M 5% /boot/efi
> /dev/mapper/fedora-gluster--fw1 9.8G 33M 9.8G 1% /gluster-fw1
> 192.168.253.1:/firewall-scripts 9.8G 33M 9.8G 1% /firewall-scripts
> [root at stylmark-fw1 ~]# ls /firewall-scripts
> allow-all etc initial_rc.firewall rcfirewall.conf var
> allow-all-with-nat failover-monitor.sh rc.firewall start-failover-monitor.sh
> [root at stylmark-fw1 ~]#
>
>> You can avoid the split-brain using a couple of quorum techniques, the one that would seem to satisfy your
>> requirements leaving your volume read-only during the duration of the outage.
> I like this idea - how do I do it?
I don't see a follow-up here, so I will put in (only) my two cents worth.
If I understand correctly, you get the read-only condition by using
client-side quorum. The behavior you describe above sounds like that
produced by server-side quorum -- the volume goes offline until a quorum is
present.
I have suffered through a couple of split-brain situations, and I agree that
you do not want to run a two-node setup without quorum.
You may have gotten an answer that I did not see, but even so, I'll leave
this here for the next guy who has a question.
Ted Miller
Elkhart, IN
>
> - Greg
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list