[Gluster-users] NUFA/replicate setup
James Cipar
jcipar at cmu.edu
Thu May 28 18:45:45 UTC 2009
Hi,
I'm considering using Gluster as the main storage system for a
small cluster, but I'm not sure if it fits my needs, or how to
configure it to do so. We need an FS that can be expanded
incrementally, and is fault tolerant. From reading the docs, I'm
hopeful that Gluster with some combination of NUFA and replicate will
do what we want, but I'm not really sure how to make it happen. Here
are the details:
Currently we have ~70 machines, each with a 4x1TB disks in them. We
would like to aggregate some, or maybe all, of these into one
distributed file system. The DFS would be used for a number of things
including:
- User home directories
- Virtual machine disk images
- Large data sets (many TB)
The properties that we are looking for are:
- Fault tolerance: each piece of data should be replicated, or erasure
coded, so that the FS can survive (n) server failures without
affecting availability, for some configurable (n). When a server
fails, we might not be able to replace/repair it for a while, so the
data must be automatically re-replicated when this happens.
- Scalable: It must be easy to both add, and remove nodes from the
cluster. The cluster will be expanded incrementally, adding 5-10
nodes at a time. Ideally this would be as simple as dropping in the
new machine, adding it to a list of storage nodes, and having the FS/
rebalancer daemons take care of the rest. This included adding
heterogenous servers with different amounts/types of disks. How close
does Gluster come to that kind of plug-and-play? Will data be moved
from full nodes to the new one automatically? In addition, it is
likely that we will want to remove nodes for extended periods. How
easy is it to remove a set of nodes from Gluster without breaking
anything.
- Handles large files gracefully: If a file is larger than any one
server can hold (e.g. many TB), will Gluster automatically stripe it
across servers? It looks like it can be configured to do this on a
per-mountpoint basis, which is probably fine for us.
- Promotes local access: The machines will also be running
computation jobs on them, accessing these large data sets, or starting
virtual machines from the disk images. Ideally, access would go to
the local disk if possible. It looks like NUFA takes care of this,
but I don't know how this would interact with replication.
Has anyone set up a similar file system with Gluster? What does you
config look like? Any advice?
Thanks,
Jim
More information about the Gluster-users
mailing list