[Gluster-users] Gluster distributed replicated setup does not serve read from all bricks belonging to the same replica

Ravishankar N ravishankar at redhat.com
Sat Nov 24 08:57:20 UTC 2018



On 11/24/2018 01:03 PM, Anh Vo wrote:
> Looking at the source (afr-common.c) even in the case of using hashed 
> mode and the hashed brick doesn't have a good copy it will try the 
> next brick am I correct?
That is correct, no matter which brick the policy chooses,  if that 
brick is not readable for a given file (i.e. a heal is pending on it 
from the other good bricks), we just iterate from brick-0, and pick the 
first one that is good (i.e. readable).
-Ravi
> I'm curious because your first reply seemed to place some significance 
> on the part about pending self-heal. Is there anything about pending 
> self-heal that would have made hashed mode worse, or is it about as 
> bad as any brick selection policy?
>
> Thanks
>
> On Thu, Nov 22, 2018 at 7:59 PM Ravishankar N <ravishankar at redhat.com 
> <mailto:ravishankar at redhat.com>> wrote:
>
>
>
>     On 11/22/2018 07:07 PM, Anh Vo wrote:
>>     Thanks Ravi, I will try that option.
>>     One question:
>>     Let's say there are self heal pending, how would the default of
>>     "0" have worked? I understand 0 means "first responder" What if
>>     first responder doesn't have good copy? (and it failed in such a
>>     way that the dirty attribute wasn't set on its copy - but there
>>     are index heal pending from the other two sources)
>
>     0 = first readable child of AFR, starting from 1st child. So if
>     1st brick doesn't have the good copy, it will try the 2nd brick
>     and so on.
>     The default value seems to be '1' not '0'. You can look at
>     afr_read_subvol_select_by_policy() in the source code to
>     understand the preference of selection.
>
>     Regards,
>     Ravi
>>
>>     On Wed, Nov 21, 2018 at 9:57 PM Ravishankar N
>>     <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>
>>         Hi,
>>         If there are multiple clients , you can change the
>>         'cluster.read-hash-mode' volume option's value to 2. Then
>>         different reads should be served from different bricks for
>>         different clients. The meaning of various values for
>>         'cluster.read-hash-mode' can be got from `gluster volume set
>>         help`. gluster-4.1 also has added a new value[1] to this
>>         option. Of course, the assumption is that all bricks host
>>         good copies (i.e. there are no self-heals pending).
>>
>>         Hope this helps,
>>         Ravi
>>
>>         [1] https://review.gluster.org/#/c/glusterfs/+/19698/
>>
>>         On 11/22/2018 10:20 AM, Anh Vo wrote:
>>>         Hi,
>>>         Our setup: We have a distributed replicated setup of 3
>>>         replica. The total number of servers varies between
>>>         clusters, in some cases we have a total of 36 (12 x 3)
>>>         servers, in some of them we have 12 servers (4 x 3). We're
>>>         using gluster 3.12.15
>>>
>>>         In all instances what I am noticing is that only one member
>>>         of the replica is serving read for a particular file, even
>>>         when all the members of the replica set is online. We have
>>>         many large input files (for example: 150GB zip file) and
>>>         when there are 50 clients reading from one single server the
>>>         performance degrades by several magnitude for reading that
>>>         file only. Shouldn't all members of the replica participate
>>>         in serving the read requests?
>>>
>>>         Our options
>>>
>>>         cluster.shd-max-threads: 1
>>>         cluster.heal-timeout: 900
>>>         network.inode-lru-limit: 50000
>>>         performance.md-cache-timeout: 600
>>>         performance.cache-invalidation: on
>>>         performance.stat-prefetch: on
>>>         features.cache-invalidation-timeout: 600
>>>         features.cache-invalidation: on
>>>         cluster.metadata-self-heal: off
>>>         cluster.entry-self-heal: off
>>>         cluster.data-self-heal: off
>>>         features.inode-quota: off
>>>         features.quota: off
>>>         transport.listen-backlog: 100
>>>         transport.address-family: inet
>>>         performance.readdir-ahead: on
>>>         nfs.disable: on
>>>         performance.strict-o-direct: on
>>>         network.remote-dio: off
>>>         server.allow-insecure: on
>>>         performance.write-behind: off
>>>         cluster.nufa: disable
>>>         diagnostics.latency-measurement: on
>>>         diagnostics.count-fop-hits: on
>>>         cluster.ensure-durability: off
>>>         cluster.self-heal-window-size: 32
>>>         cluster.favorite-child-policy: mtime
>>>         performance.io-thread-count: 32
>>>         cluster.eager-lock: off
>>>         server.outstanding-rpc-limit: 128
>>>         cluster.rebal-throttle: aggressive
>>>         server.event-threads: 3
>>>         client.event-threads: 3
>>>         performance.cache-size: 6GB
>>>         cluster.readdir-optimize: on
>>>         storage.build-pgfid: on
>>>
>>>
>>>
>>>
>>>
>>>
>>>         _______________________________________________
>>>         Gluster-users mailing list
>>>         Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>         https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181124/98f7d179/attachment.html>


More information about the Gluster-users mailing list