[Gluster-users] Gluster distributed replicated setup does not serve read from all bricks belonging to the same replica
Ravishankar N
ravishankar at redhat.com
Sat Nov 24 08:57:20 UTC 2018
On 11/24/2018 01:03 PM, Anh Vo wrote:
> Looking at the source (afr-common.c) even in the case of using hashed
> mode and the hashed brick doesn't have a good copy it will try the
> next brick am I correct?
That is correct, no matter which brick the policy chooses, if that
brick is not readable for a given file (i.e. a heal is pending on it
from the other good bricks), we just iterate from brick-0, and pick the
first one that is good (i.e. readable).
-Ravi
> I'm curious because your first reply seemed to place some significance
> on the part about pending self-heal. Is there anything about pending
> self-heal that would have made hashed mode worse, or is it about as
> bad as any brick selection policy?
>
> Thanks
>
> On Thu, Nov 22, 2018 at 7:59 PM Ravishankar N <ravishankar at redhat.com
> <mailto:ravishankar at redhat.com>> wrote:
>
>
>
> On 11/22/2018 07:07 PM, Anh Vo wrote:
>> Thanks Ravi, I will try that option.
>> One question:
>> Let's say there are self heal pending, how would the default of
>> "0" have worked? I understand 0 means "first responder" What if
>> first responder doesn't have good copy? (and it failed in such a
>> way that the dirty attribute wasn't set on its copy - but there
>> are index heal pending from the other two sources)
>
> 0 = first readable child of AFR, starting from 1st child. So if
> 1st brick doesn't have the good copy, it will try the 2nd brick
> and so on.
> The default value seems to be '1' not '0'. You can look at
> afr_read_subvol_select_by_policy() in the source code to
> understand the preference of selection.
>
> Regards,
> Ravi
>>
>> On Wed, Nov 21, 2018 at 9:57 PM Ravishankar N
>> <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>
>> Hi,
>> If there are multiple clients , you can change the
>> 'cluster.read-hash-mode' volume option's value to 2. Then
>> different reads should be served from different bricks for
>> different clients. The meaning of various values for
>> 'cluster.read-hash-mode' can be got from `gluster volume set
>> help`. gluster-4.1 also has added a new value[1] to this
>> option. Of course, the assumption is that all bricks host
>> good copies (i.e. there are no self-heals pending).
>>
>> Hope this helps,
>> Ravi
>>
>> [1] https://review.gluster.org/#/c/glusterfs/+/19698/
>>
>> On 11/22/2018 10:20 AM, Anh Vo wrote:
>>> Hi,
>>> Our setup: We have a distributed replicated setup of 3
>>> replica. The total number of servers varies between
>>> clusters, in some cases we have a total of 36 (12 x 3)
>>> servers, in some of them we have 12 servers (4 x 3). We're
>>> using gluster 3.12.15
>>>
>>> In all instances what I am noticing is that only one member
>>> of the replica is serving read for a particular file, even
>>> when all the members of the replica set is online. We have
>>> many large input files (for example: 150GB zip file) and
>>> when there are 50 clients reading from one single server the
>>> performance degrades by several magnitude for reading that
>>> file only. Shouldn't all members of the replica participate
>>> in serving the read requests?
>>>
>>> Our options
>>>
>>> cluster.shd-max-threads: 1
>>> cluster.heal-timeout: 900
>>> network.inode-lru-limit: 50000
>>> performance.md-cache-timeout: 600
>>> performance.cache-invalidation: on
>>> performance.stat-prefetch: on
>>> features.cache-invalidation-timeout: 600
>>> features.cache-invalidation: on
>>> cluster.metadata-self-heal: off
>>> cluster.entry-self-heal: off
>>> cluster.data-self-heal: off
>>> features.inode-quota: off
>>> features.quota: off
>>> transport.listen-backlog: 100
>>> transport.address-family: inet
>>> performance.readdir-ahead: on
>>> nfs.disable: on
>>> performance.strict-o-direct: on
>>> network.remote-dio: off
>>> server.allow-insecure: on
>>> performance.write-behind: off
>>> cluster.nufa: disable
>>> diagnostics.latency-measurement: on
>>> diagnostics.count-fop-hits: on
>>> cluster.ensure-durability: off
>>> cluster.self-heal-window-size: 32
>>> cluster.favorite-child-policy: mtime
>>> performance.io-thread-count: 32
>>> cluster.eager-lock: off
>>> server.outstanding-rpc-limit: 128
>>> cluster.rebal-throttle: aggressive
>>> server.event-threads: 3
>>> client.event-threads: 3
>>> performance.cache-size: 6GB
>>> cluster.readdir-optimize: on
>>> storage.build-pgfid: on
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181124/98f7d179/attachment.html>
More information about the Gluster-users
mailing list