[Gluster-devel] [Gluster-users] GlusterFS and optimization of locality.

Thu Apr 4 02:18:41 UTC 2013

On 04/03/2013 02:42 PM, Jeff Darcy wrote:
> (redirecting to gluster-devel as a more appropriate forum)
>
> On 04/03/2013 05:03 PM, Jay Vyas wrote:
>> suppose I was going to serve a pedabyte of data sharded over 10 files
>> (1,2,3,...,10) over glusterfs, in 3 servers (call them Server1, Server2,
>> and Server3).
>>
>> The 3 servers would need access to the files such that :
>>
>> Server 1 will usually only access file 1
>> Server 2 will usually only access file2.
>> Server 3 will access all ten files (the whole data set).
>>
>> Is there a way to get gluster to rebalance bricks over time based on
>> access patterns ... or otherwise .. what is the best way to increase the
>> average locality of access to files in the cluster ?
> The flippant answer would be to move the computation to the data instead
> of vice versa, like Hadoop is designed to do.  ;)
>
> The less flippant answer is going to get a bit more complicated.  There
> are three ways that you can control placement of a file, but none are
> really supported and all could get you in trouble.  The first method is
> to create the file (or a copy) with a special name of the form
> file at dht:subvol, where the parts have the following meanings:
>
> * file = the file name you really want
>
> * dht = the name of the DHT translator in your client-side volfile
>
> * subvol = the name (from the same volfile) of the DHT subvolume where
> you want the file to go
>
> This is reasonably safe, because it's part of how rebalance works.  To
> get even fancier than that, you need to know something about how the DHT
> translator uses "layouts" on directories to place files.  There's a
> description here.
>
> http://hekafs.org/index.php/2012/03/glusterfs-algorithms-distribution/
>
> The problem is that the user has very little control over how these
> layouts are generated.  One thing you can do that's fairly easy is swap
> the layout xattrs on two bricks, which (after a rebalance) will swap
> what files they contain.  For example, if your file is on brick2 and you
> want it to be on brick1, you swap the xattr values for that directory
> within brick1 and brick2.
>
> The ultimate level of control is to calculate your own layouts.  For
> this to be useful in a scenario like yours, you'd need to copy or
> reverse-engineer the code in the DHT translator that calculates the hash
> for a file.  Knowing that, you could do something like this:
>
> * assign a range for brick1 that contains the hash for file1
>
> * assign a range for brick2 that contains the hash for file2
>
> * assign the remaining range to brick3
>
> I'm working on some mechanisms, and accompanying management/interface
> models, to provide this sort of control in a less hacker-ish form.
> Unfortunately, I'm tied down with about ten higher priorities, so I
> don't have any idea when that will be ready.  In the meantime, please
> try these techniques *only with test data*, and caveat emptor.
>
Even better than reverse-engineering the dht hash function in order to 
calculate the hashes, you can just use the library function directly 
like I do at http://joejulian.name/blog/dht-misses-are-expensive/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130403/8f9703af/attachment-0001.html>