[Gluster-users] halo not work as desired!!!

atris adam atris.adam at gmail.com
Mon Feb 5 12:34:24 UTC 2018


I have mounted the halo glusterfs volume in debug mode, and the output is
as follows:
.
.
.
[2018-02-05 11:42:48.282473] D [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk]
0-test-halo-client-1: Ping latency is 0ms
[2018-02-05 11:42:48.282502] D [MSGID: 0]
[afr-common.c:5025:afr_get_halo_latency] 0-test-halo-replicate-0: Using
halo latency 10
[2018-02-05 11:42:48.282525] D [MSGID: 0]
[afr-common.c:4820:__afr_handle_ping_event] 0-test-halo-client-1: Client
ping @ 140032933708544 ms
.
.
.
[2018-02-05 11:42:48.393776] D [MSGID: 0]
[afr-common.c:4803:find_worst_up_child] 0-test-halo-replicate-0: Found
worst up child (1) @ 140032933708544 ms latency
[2018-02-05 11:42:48.393803] D [MSGID: 0]
[afr-common.c:4903:__afr_handle_child_up_event] 0-test-halo-replicate-0:
Marking child 1 down, doesn't meet halo threshold (10), and >
halo_min_replicas (2)
.
.
.

I think these debug output means:
As the ping time for test-halo-client-1 (brick2) is (0.5ms) and it is not
under halo threshold (10 ms), this false decision for selecting bricks
happen to halo.
I can not set the halo threshold to 0 because:

#gluster vol set test-halo cluster.halo-max-latency 0
volume set: failed: '0' in 'option halo-max-latency 0' is out of range [1 -
99999]

so I think the range [1 - 99999] should change to [0 - 99999], so I can get
the desired brick selection for halo feature, am I right? If not, why the
halo decide to mark down the best brick which has ping time bellow 0.5ms?

On Sun, Feb 4, 2018 at 2:27 PM, atris adam <atris.adam at gmail.com> wrote:

> I have 2 data centers in two different region, each DC have 3 severs, I
> have created glusterfs volume with 4 replica, this is glusterfs volume info
> output:
>
>
> Volume Name: test-halo
> Type: Replicate
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 4 = 4
> Transport-type: tcp
> Bricks:
> Brick1: 10.0.0.1:/mnt/test1
> Brick2: 10.0.0.3:/mnt/test2
> Brick3: 10.0.0.5:/mnt/test3
> Brick4: 10.0.0.6:/mnt/test4
> Options Reconfigured:
> cluster.halo-shd-max-latency: 5
> cluster.halo-max-latency: 10
> cluster.quorum-count: 2
> cluster.quorum-type: fixed
> cluster.halo-enabled: yes
> transport.address-family: inet
> nfs.disable: on
>
> bricks with ip 10.0.0.1 & 10.0.0.3 are in region A and bricks with ip
> 10.0.0.5 & 10.0.0.6 are in region B
>
>
> when I mount the volume in region A, I except the data first store in
> brick1 & brick2, then asynchronously the data copies in region B, on brick3
> & brick4.
>
> Am I write? this is what halo claims?
>
> If yes, unfortunately, this not happen to me, no differ I mount the volume
> in region A or mount the volume in region B, all the data are copied in
> brick3 & brick4 and no data copies in brick1 & brick2.
>
> ping bricks ip from region A is as follows:
> ping 10.0.0.1 & 10.0.0.3 are bellow  time=0.500 ms
> ping 10.0.0.5 & 10.0.0.6 are more than  time=20 ms
>
> What is the logic that the halo select the bricks to write to?if it is the
> access time, so when I mount the volume in region A, the ping time to
> brick1 & brick2 is bellow 0.5 ms, but the halo select the brick3 &
> brick4!!!!
>
> glusterfs version is:
> glusterfs 3.12.4
>
> I really need to work with halo feature, But I am not successful to run
> this case, Can anyone help me soon??
>
>
> Thx alot
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180205/cc463dbc/attachment.html>


More information about the Gluster-users mailing list