[Gluster-users] Read from fastest node only

Ravishankar N ranaraya at redhat.com
Fri Jul 30 05:48:44 UTC 2021


On Thu, Jul 29, 2021 at 3:16 PM David Cunningham <dcunningham at voisonics.com>
wrote:

> Hello,
>
> Thanks for all the replies. I'll try to address each point:
>
> 1. "First readable child... Isn't this the first brick in the subvolume"
> Does that mean the first brick in the list returned by "gluster volume
> status"?
>
> 2. "I think that you can play a little bit with md-cache"
> Unfortunately I don't have access to the RH article. We would be happy to
> skip the lookups when reading a file, as long as some data is returned and
> the file read doesn't block or fail. Do you think that md-cache can
> accomplish this? If so we might invest more in researching it.
>
> 3. "In real life, the 'best' node is the one with the highest overall free
> resources, across CPU, network and disk IO."
> My question relates to a slightly different real life, where some of the
> nodes are within a very short latency (eg 0.15ms) and some others are
> further away (eg 15ms). What we want to avoid is the 15ms delay to check
> the nodes further away. The CPU, disk IO, etc load on each server is going
> to be insignificant compared to the difference in the latency between nodes.
>
> 4. "Our latency check is indeed not per file, AFAIK."
> If GlusterFS checks the health of the file on each read then I guess the
> latency to all nodes will be a factor on each read. This is what we're
> looking for a way to avoid.
>
> 5. "I think you mean cluster.choose-local which is enabled by default.
> Yet, Gluster will check if the local copy is healthy."
> What is the local copy exactly? I'm talking from the point of view of a
> machine which is running the GlusterFS FUSE client, and is not a GlusterFS
> node.
>

Local copy means the fuse client is mounted on the same node as that of the
brick and the file in question happens to reside on that brick.

First readable child simply means the first healthy brick (i.e no pending
heals) as you iterate through a 'for loop' of all bricks beginning with
brick-0. So if you set the read-hash-mode to 0, and if all bricks were
healthy, the reads will always be served from brick-0.

> 4 = brick having the least network ping latency.
Like Yaniv said, the latency check is *not* on a per file. When the client
attempts to connect to the bricks at the time of mount, the ping time of
all bricks are captured and the one with the lowest value is used
thereafter.

If you have already identified that a particular brick has the lowest
latency w.r.t communicating with the client, you can also use that brick
using the `read-subvolume-index` option. But bear in mind all these
policies are global and will affect all clients. Also, AFR based
replication is synchronous and not an eventual consistency model. So your
original requirement of older reads won't work - i.e. read will never be
served from a brick that needs heal even if it is the fastest.

You can look at afr_read_subvol_select_by_policy() in the code if you want
to get a glimpse of the internals.

Hope that helps,
Ravi



>
> Thanks again for the input.
>
>
> On Wed, 28 Jul 2021 at 23:36, Gionatan Danti <g.danti at assyoma.it> wrote:
>
>> Il 2021-07-28 13:11 Strahil Nikolov ha scritto:
>> > I think you mean cluster.choose-local which is enabled by default.
>> > Yet, Gluster will check if the local copy is healthy.
>>
>> Ah, ok, from reading here [1] I was under the impression that
>> cluster.choose-local was somewhat deprecated.
>> Good to know that it is here to stay!
>> Regards.
>>
>> [1]
>> https://lists.gluster.org/pipermail/gluster-users/2015-June/022288.html
>>
>>
>> --
>> Danti Gionatan
>> Supporto Tecnico
>> Assyoma S.r.l. - www.assyoma.it
>> email: g.danti at assyoma.it - info at assyoma.it
>> GPG public key ID: FF5F32A8
>>
>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210730/8fee91a2/attachment.html>


More information about the Gluster-users mailing list