[Gluster-devel] Config advice?

David Braginsky daveey at facebook.com
Tue Jan 13 09:33:51 UTC 2009


Hey guys,

I am trying to set up glusterfs on a few hundred nodes and was hoping for some advice.

I have a few hundred machines, spread across two datacenters, in racks of 40 nodes. These machines act as in-memory servers, but during startup need to load their data off disk. Each server is replicated, so several machines serve the same data. Therefore, on startup, each file will be loaded by several machines at once. The data is written via appends to a set of log files, which are periodically rotated. 

I don't know how well glusterfs would handle cross-datacenter writes, so my plan is to have a separate fs on in each datacenter. I plan on using the same server machines to run the fs. My naïve approach is to pick a replication factor (4), generate a bunch of AFR clusters (numNodes / replicationFactor), then use DHT to map my data onto the AFT clusters. That way each file will get mapped onto 4 machines, which should handle the read throughput as well as failures. Does that make sense?

Or would it be better to use AFT over DHT? Or HA in some capacity? Is there any way to achieve rack affinity? It'd be nice if reads were done from the local rack when possible.

I can imagine all of these setups, but some require more complicated config files than others. Any advice would be appreciated.


More information about the Gluster-devel mailing list