[Gluster-devel] Improving real world performance by moving files closer to their target workloads

Anand Babu Periasamy ab at gnu.org.in
Thu May 15 22:55:31 UTC 2008


Hi Luke,
Are you going to present this storage cluster to outside network
over the 10GigE uplink or this storage is purely for local
computing purposes on the same nodes?

If you are looking to build an computing/storage integrated
system, then you should look at the NUFA scheduler. When
you run HPC jobs, lot of scratch data will be generated.
Local disks are always faster than remote disks. NUFA
scheduler is aware of local/remote disks while scheduling.

NUFA decides disk affinity only at the time of creation.
This is OK for scratch data, but for permanent data,
the I/O profile may change over time. For example,
if node72 reads a file on node21 frequently, then it
makes sense to move the file to node21.

There are lot of ways we can do optimization in GlusterFS
if we know the application requirements.

Here are few tips to explore:

1) disk-io-cache: Implement a new disk based caching translator
based on current memory based io-cache translator (or extend it
to support disks).

2) HSM: hierarchical storage management: Frequently accessed
files will be pre-fetched to a faster/local cache volume of
limited capacity.

3) glusterfs-defrag utility: Optimize the volume by moving files
around based on the I/O stat logs. It will do a number of useful
things such as leveling the volumes based on free disk space,
read-usage, write-usage, file sizes and so on.

On the legal side:
Your university legal department should give a written statement
agreeing to release the code under GPLv3 or later and documentation
under GNU FDL v1.2 or later. Your university will retain the
copyright ownership. If you need the Gluster team to defend
your work legally, then you can assign the copyright to Z RESEARCH
instead. If you are not going to re-distribute the code
and use it only for your own internal use, then there is no
legal issue.

Hope this helps..
--
Anand Babu Periasamy
GPG Key ID: 0x62E15A31
Blog [http://ab.freeshell.org]
The GNU Operating System [http://www.gnu.org]
Z RESEARCH Inc [http://www.zresearch.com]



Luke McGregor wrote:
> Hi
> 
> Im Luke McGregor and im working on a project at the university of
> waikato computer science department to make some improvements to
> GLusterFS to improve performance for our specific application. We are
> implementing a fairly small cluster (90 machines currently) to use for
> large scale computing projects. This machine is being built using
> comodity hardware and backended into a gigabit ethernet backbone with
> 10G uplinks between switches. Each node in the cluster will be
> responsible for both storage and workload processing. This is to be
> achieved with single sata disks in the machines.
> 
> We are currently experimenting with running GLuster over the nodes in
> the cluster to produce a single large filesystem. For my Honors
> research project ive been asked to look into making some improvements
> to GLuster to try to improve performance by moving the files within
> the GLusterFS closer to the node which is accessing the file.
> 
> What i was wondering is basically how hard would it be to write code
> to modify the metadata so that when a file is accessed it is then
> moved to the node which it is accessed from and its location is
> updated in the metadata.
> 
> Any help/advice where to start would be much appreciated.
> 
> Thanks
> Luke McGregor
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel





More information about the Gluster-devel mailing list