[Gluster-users] Replica 3 with arbiter - heal error?

Pavel Szalbot pavel.szalbot at gmail.com
Tue Jul 11 13:59:10 UTC 2017


I tested the same procedure on volume with following config and cannot
reproduce the issue. Should I file a bug?

transport.address-family: inet
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off

Btw nevermind the 40 seconds timeout, got the network.ping-timeout ;-)
-ps


On Tue, Jul 11, 2017 at 3:03 PM, Pavel Szalbot <pavel.szalbot at gmail.com> wrote:
> Hello,
>
> I have a Gluster 3.8.13 with replica 3 arbiter volume mounted and run
> there a following script:
>
> while true; do echo "$(date)" >> a.txt; sleep 2; done
>
> After few seconds I add a rule to the firewall on the client, that
> blocks access to node specified during mount e.g. if volume is mounted
> with:
>
> mount -t glusterfs -o backupvolfile-server=10.0.0.2 10.0.0.1:/vol /mnt/vol
>
> I add:
>
> iptables -A OUTPUT -d 10.0.0.1 -j REJECT
>
> This causes the script above to block for approximately 40 seconds
> until gluster client tries backupvolfile-server (can this timeout be
> changed?) and everything continues as expected.
>
> Heal info shows that this file (a.txt) undergoes healing. About a
> minute later, last line of the a.txt contains $(date) same as just
> before the firewall modification. Each consecutive write e.g. echo
> "STRING" >> a.txt actually appends not the "STRING", but number of
> bytes previously written.
>
> If the file content just before firewall rule addition is:
> Tue Jul 11 14:19:37 CEST 2017
> Tue Jul 11 14:19:39 CEST 2017
>
> It will later become (which is OK):
> Tue Jul 11 14:19:37 CEST 2017
> Tue Jul 11 14:19:39 CEST 2017
> Tue Jul 11 14:20:18 CEST 2017
> Tue Jul 11 14:20:20 CEST 2017
> Tue Jul 11 14:20:22 CEST 2017
>
> But after some time, file content is only:
> Tue Jul 11 14:19:37 CEST 2017
> Tue Jul 11 14:19:39 CEST 2017
>
> And echo "STRING" >> a.txt makes it (6 bytes appended, not STRING):
> Tue Jul 11 14:19:37 CEST 2017
> Tue Jul 11 14:19:39 CEST 2017
> Tue Ju
>
> Another echo "STRING" >> a.txt causes the content to be:
> Tue Jul 11 14:19:37 CEST 2017
> Tue Jul 11 14:19:39 CEST 2017
> Tue Jul 11 1
>
> Removing the firewall rule does not change the content and different
> client with access to all nodes sees exactly the same content as this
> one.
>
> Is this normal behavior or bug or is there any configuration that I
> should have changed in order to have replica 3 with arbiter highly
> available?
>
> I stumbled upon this while testing how to upgrade Gluster so the
> clients resp. VMs on the clients are not affected by the "transport
> endpoint error" caused by primary mountpoint undergoing upgrade and
> therefore glusterd being not available for several seconds.
>
> Volume config:
> server.allow-insecure: on
> server.outstanding-rpc-limit: 1024
> performance.read-ahead: off
> performance.io-thread-count: 64
> performance.client-io-threads: on
> performance.cache-size: 1GB
> cluster.self-heal-daemon: enable
> nfs.disable: on
> performance.readdir-ahead: on
> features.shard: on
> performance.quick-read: off
> performance.io-cache: off
> performance.stat-prefetch: off
> performance.low-prio-threads: 32
> network.remote-dio: enable
> cluster.eager-lock: enable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 10000
> user.cifs: off
>
> -ps


More information about the Gluster-users mailing list