[Gluster-users] scale-n-defrag of 50TB across a 6 node cluster

Tue Jul 6 16:41:48 UTC 2010

Our cluster was out of balance as we had two servers running glusterfs
under a RAID1 setup, and only after those two servers were full did we
add the additional four to our group.  Now we're running the
scale-n-defrag.sh script across all 50TB of data across the six node
cluster.  So we continue to get closer to having all of the data
balanced across the 6 nodes on the cluster, though it seems to be
going very slowly overall - this process has been running for more
than a week now.  Looking at the networking graph on this page shows
that it's still working, and passing data across the network.
http://whbhl01.ubio.org/ganglia/?m=load_one&r=hour&s=descending&c=Woods+Hole&h=&sh=1&hc=3&z=medium

Looking at the servers' disk usage on the command line we see that the
data in indeed being equally distributed across all 24 mounts on each
node.  While not being able to get an update from the gluster process,
we can see by physically looking at the disk usage that:

1 - done balancing
2 - done balancing
3 - done balancing
4 - beginning balancing
5 - beginning balancing
6 - about 1/2 complete balancing

This makes sense, since 4/5 were the first two servers, and were the
full ones, and the most out of sync with the others.  It seems like
1/2/3 and most of 6 have gotten the majority of the balancing
complete.  Does this sound normal?  Also, would it cause the process
to run longer if we started moving files around in their directories
on the nodes? (we need to move the files to a shared docroot so they
can be served via HTTP).  I realize now that the best way to build
this cluster would have been to have the entire cluster up and
running, and then load the data, but since over 50TB needed to be
transfered to the cluster over the Internet, we thought starting
sooner and adding nodes as we grew was the best way to proceed.

Also, does anyone have configuration suggtions for serving static
files for websites from glusterfs?  Either as far as configuration of
the .vol files, or the architecture of how the servers are laid out:
I'm thinking of two ways:

Internet -> SERVER 1 (www server with glusterfs client running) using
/mnt/glusterfs/www as the docroot

- or -

Internet -> SERVER1 (www server) -> CLUSTER1 (www server with
glusterfs server and client running) using /mnt/glusterfs/www as the
docroot

P
-- 
http://philcryer.com