[Gluster-users] Unify / Distribute / Strip -- Some feedback
Kali Hernandez
kali at thenetcircle.com
Wed Apr 7 03:40:59 UTC 2010
Hi,
First of all, excuse my imperfect English, and use the following info as
a user story for my personal experience, the results and problems shown
here may or may not apply to your particular environment and
requirements, but if you are thinking on using glusterfs or are already
having problems, this might be useful:
Like someone already read on my previous messages, I am facing a
situation where Distribute / Strip translators will run into "non free
space" situations even when the overall cluster shows Gb's of free space
left. So after some good advice I am falling back to Unify translator.
As I have a previous glusterfs setup (as distribute) and I had to move
everything from there (and I don't have an intermediate mountpoint where
I could gather all the data), I am forced to mount both old and new
glusters in the same machine, and then moving data from the old to the
new one.
My first try was creating the new gluster over glusterfs 3.0.3 using
stripe translator. But then I found that I will also fall into the "non
free space" situation, and I had to look for another solution, which
ended in rolling back to 2.0.4 and using Unify translator with ALU
scheduler.
Moving data became extremely (and painfully) slow: reading from a
networked gluster and writing to another one! When using 3.0.3 Stripe I
was hitting some useful transfer speed, but when I switched back to
2.0.4 Unify I got an overall transfer speed of 1,2~2,0 Mb/s. With almost
600 Gb of info that would last forever.
So what I did was stopping the old gluster (distributed) and log in on
the storage nodes, then rsync all the content over ssh into the mount
point of the new gluster. This improved the transfer speed
significantly, achieving some nice speed of almost 20 mbps.
Attached to this email is a graph (generated with Graphite) showing the
evolution of the filling process of the new gluster. The green line
shows the size of the old gluster (original data) while the blue one
shows the evolution of filling new one. There you can see the 1st slope
which was filling the 3.0.3 Stripe, and the 2nd one which belongs to the
2.0.4 Unify.
Mark 1 shows when we hit the false "disk full" situation. Before that
you see the speed of copying from one gluster mount point to another
directly.
Mark 2 shows the incredibly slow speed slope of directly copying from
gluster to gluster when using 2.0 and Unify as target. Note the amazing
difference against 3.0 direct copy. Both copies were performed with a
single "cp -r" in the system mounting both glusters.
Mark 3 shows the speed slope when I started to copy *simultaneously*
using "cp" and "rsync" from the storage nodes. Still it's quite slower
than 3.0 results.
Mark 4 shows the speed with the original gluster stopped and data being
copied using only rsync from the storage nodes over ssh. In this case
you can see much better performance than 3.0
Another important thing to feedback about Unify: I misunderstood the
storage schema and at first I dedicated 2 full storage nodes for
replicating the namespace, thus loosing 40 Gb of overal storage
capacity. Then Krzysztof suggested using the storage space and moving
the namespace volume to another defined brick on the same machine nodes,
thus having 2 machines with both storage and namespace info. Then I run
into the question of having to re-create the whole data on the
namespace, either somehow or having to start the copy back from scratch
(again), but I just tried moving the files locally on the nodes to
another folder (using "mv" and with the glusterfsd daemon stopped) and
it worked finely!
This I recovered back the whole capacity and functionality of the
storage cluster.
For us, the need of this storage cluster is basically a backup space,
most commonly written to and very rarely read from. Also, we did not
want to enter into more complicated clustering schema, and I personally
wanted to avoid using Lustre or GFS (our next alternatives) because both
need to install kernel modules and use LVM for storage, and I find it
more useful for us the possibility to always access locally the data on
the cluster nodes, in case the service goes down. Simplifying the
general structure and keeping all in user space was worthy enough for us.
However, it is quite disappointing finding out that both the actual
given approaches of glusterfs storage do not work properly for
production environment, as stated before, because I will eventually be
unable to use the whole cluster disk space. Unify on 2.0 seems to be
slower in transfer speed, but at least it does work. I can't understand
why the only fully working solution has been deprecated and can't be
used with the last version, making the whole purpose of glusterfs 3.0
just a theoretical solution.
Another very annoying point in the whole process was the complete
absence of online documentation. The official wiki is vague and
incomplete, and the only suggestion I found was "use the volume
generator script", but I hardly can find how the translators work
internally. It is very sad to find out that the only well documented
translator on the whole wiki was the deprecated Unify, aging back to 1.4
versions.
Thanks to all the people on the list who helped me finding out the
problems and solutions, and the answers I couldn't find for the key
questions on the official "documentation".
Hope this become somehow useful for the upcoming people, and as always,
suggestions. comments and corrections are more than welcome.
More information about the Gluster-users
mailing list