[Gluster-users] AFR/replicate questions

Fri Jan 30 21:20:36 UTC 2009

At 04:20 AM 1/30/2009, Barnaby Gray wrote:
>I'm in the process of setting up server-side AFR with 2 servers in
>separate data centres, separated by a WAN. Writes will be relatively
>few, so we can live with the performance limitations of the WAN.
>
>I noticed unexpected performance though when listing directories of
>around 1k files with ls -al. It looks like for this operation server1 is
>sending traffic to server2 in the other data centre, which for a
>read-only operation I wasn't expecting.

anytime a directory is accessed, gluster/replicate checks with the 
other server to see if the information it has is current.
It does this because if something changed on the other machine it 
might not have known about it.  If something has changed, it auto-heals.

Since gluster doesn't cache information about the other machines in a 
replicate group, it has to do this everytime.

>tshark shows a reasonable amount of traffic that looks related to xattr:
>lots of mentions of filenames and 'trusted.glusterfs.afr.metadata-pending'.
>
>I'm using the "option read-subvolume local" to point read operations to
>the volume local to either server.

this means.   Once it's determined that my version of the file is the 
most up to date, then serve it from my disk (or my favorite server in 
a client-server model) which is faster than streaming it over the network.

>Have tried both with and without the performance translators client-side
>to no avail. We're using 2.0.0rc1.

I dont suspect any performance translator can help with this 
particular situation.  Gluster HAS to insure that it's delivering the 
most up to date version of a file, in order to do that, upon any file 
request, it has to collaborate with other replicate servers to find out.

>Apologies if this is an obvious question - can someone spot what I'm
>doing wrong?

one might think, "well, both servers haven't lost connections with 
eachother, so they should be able to assume they're in sync," but 
this isn't necessarily the case because you can't know the 
configuration on the other end.

there may be a situation where Server A decided Server B was down 
because of a network latency, so it wrote and updated a file but 
didn't replicate it to Server A.  Server B goes to read that file, if 
it assumes that all has been well with Server A and doesn't bother 
checking then it will serve the wrong version of the file.

The only way to resolve this would be to make server B responsible 
for notifying server A  when it re-establishes a connection to 
it.   While this seems logical and would improve performance for your 
case, this would require some sort of journaling on server B.  This 
would be terribly inefficient and would require an additional journal 
filesystem, or modifying the underlying filesystem in a some 
way.  Then there's the case of changing architecture.   If you have 
10 servers in your replicate group, you have to run a journal for all 
10,  lets say you just shut 5 of them off forever, you'd then need a 
way to clear out the journal for those so that space isn't wasted.

So given that gluster wants to be non-intrusive

>cheers,
>
>Barnaby
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users