[Gluster-users] reading from local replica?
tmiller at sonsetsolutions.org
Tue Jun 9 14:21:34 UTC 2015
On 6/8/2015 5:55 PM, Brian Ericson wrote:
> Am I misunderstanding cluster.read-subvolume/cluster.read-subvolume-index?
> I have two regions, "A" and "B" with servers "a" and "b" in, respectfully,
> each region. I have clients in both regions. Intra-region communication is
> fast, but the pipe between the regions is terrible. I'd like to minimize
> inter-region communication to as close to glusterfs write operations only
> and have reads go to the server in the region the client is running in.
> I have created a replica volume as:
> gluster volume create gv0 replica 2 a:/data/brick1/gv0 b:/data/brick1/gv0
> As a baseline, if I use scp to copy from the brick directly, I get -- for a
> 100M file -- times of about 6s if the client scps from the server in the
> same region and anywhere from 3 to 5 minutes if I the client scps the
> server in the other region.
> I was under the impression (from something I read but can't now find) that
> glusterfs automatically picks the fastest replica, but that has not been my
> experience; glusterfs seems to generally prefer the server in the other
> region over the "local" one, with times usually in excess of 4 minutes.
> I've also tried having clients mount the volume using the "xlator" options
> cluster.read-subvolume and cluster.read-subvolume-index, but neither seem
> to have any impact. Here are sample mount commands to show what I'm
> mount -t glusterfs -o xlator-option=cluster.read-subvolume=gv0-client-<0 or
> 1> a:/gv0 /mnt/glusterfs
> mount -t glusterfs -o xlator-option=cluster.read-subvolume-index=<0 or 1>
> a:/gv0 /mnt/glusterfs
> Am I misunderstanding how glusterfs works, particularly when trying to
> "read locally"? Is it possible to configure glusterfs to use a local
> replica (or the "fastest replica") for reads?
I am not a developer, nor intimately familiar with the insides of glusterfs,
but here is how I understand that glusterfs-fuse file reads work.
First, all replica bricks are read, to make sure they are consistent. (If
not, gluster tries to make them consistent before proceeding).
After consistency is established, then the actual read occurs from the brick
with the shortest response time. I don't know when or how the response time
is measured, but it seems to work for most people most of the time. (If the
client is on one of the brick hosts, it will almost always read from the
If the file reads involve a lot of small files, the consistency check may be
what is killing your response times, rather than the read of the file
itself. Over a fast LAN, the consistency checks can take many times the
actual read time of the file.
Hopefully others will chime in with more information, but if you can supply
more information about what you are reading, that will help too. Are you
reading entire files, or just reading in a lot of "snippets" or what?
Elkhart, IN, USA
More information about the Gluster-users